Analysing LLM Inference Costs

Created: =dateformat(this.file.ctime,"dd MMM yyyy, hh:mm a") | Modified: =dateformat(this.file.mtime,"dd MMM yyyy, hh:mm a") Tags: knowledge

Prefill stage vs Decoder stage

Prefill is when first prompt is fed into the model → tends to be compute bound
- “time to first token”
- where time taken to compute the FLOPs (limited by the GPU FLOPS capability) is more than the time taken to transfer data (weights)
Decoder stage is when doing next token generation → tends to be memory bound
- where memory bound refers to the time taken to transfer data (weights, KV cache) in memory is actually longer than time taken to compute the FLOPs (not FLOPS), since FLOPs are lesser at this stage.

Articles

LLM Inference Economics from First Principles

Darius Knowledge Hub

Explorer

Analysing LLM Inference Costs

Analysing LLM Inference Costs

Articles

Graph View

Table of Contents

Backlinks