Inside the forward pass: GPU economics of pre-fill, decode, and serve large language models.

Last updated on February 17, 2026 by Editorial Team

Author(s): Utkarsh Mittal

Originally published on Towards AI.

Why is guessing the last game?

Pre-training a marginally large language model typically consumes 15 trillion to 30 trillion tokens. This seems like a very large number – until you do the arithmetic on the estimation side. There are approximately 7 to 8 billion people on the planet. If each person sent only one query per day on a model like ChatGPT, and each query consumed about 2,000 tokens when you take into account both the input prompt and the generated output, this alone would amount to about 14 trillion tokens – daily. One day of modest global usage is roughly equivalent to the entire token budget of pre-training. And in fact, heavy users are sending dozens or even hundreds of queries every day. Increase this to 100 questions per person per day, and you need 100× more tokens every day These were first used to train models.

Figure 1- Prefill stage

This article highlights the changing economics of artificial intelligence, with an emphasis on the shift from training to inference, which is becoming increasingly complex and expensive with the proliferation of large language models. While companies race to improve their training capabilities, the real expenses arise from guesswork, where models have to process trillions of tokens per day. The article discusses the intricacies of how language models like Llama 3 operate, detailing the steps involved in generating responses, the computational demands at each step, and the challenges of ensuring efficient GPU utilization. Ultimately, understanding this workflow is essential to optimizing AI deployment and effectively managing associated costs.

Read the entire blog for free on Medium.

Published via Towards AI

Inside the forward pass: GPU economics of pre-fill, decode, and serve large language models.

Author(s): Utkarsh Mittal

Why is guessing the last game?

We build enterprise-grade AI. We will also teach you how to master it.

Warner Bros. leaves the ownership battle open by giving Paramount a week to make its offer

Major NIH research institute asked to remove references to ‘pandemic preparedness’

Related Articles

Leave a Comment Cancel Reply