Token

LLMs

The atomic unit an LLM actually reads and writes — usually a sub-word fragment, not a whole word.


In one line

The atomic unit an LLM actually reads and writes — usually a sub-word fragment, not a whole word.

What it actually means

When you send text to an LLM, a tokenizer chops it into pieces from a fixed vocabulary (often 30k–200k entries). Common words get one token, rarer words get split into multiple pieces, and unusual characters or whitespace each have their own token. Each token becomes an integer ID, and the model only ever sees those IDs. Output works the same way in reverse: the model produces token IDs one at a time, which the tokenizer turns back into text.

Why it matters

Tokens are the unit of pricing, latency, and the context window. “How long is my prompt” is meaningless until you count tokens, because three sentences of English and three sentences of JSON can differ by 2x. They also explain weird model behaviour: models struggle with character-level tasks because they never see characters, only tokens.

Example

"Tokenization isn't trivial."
→ ["Token", "ization", " isn", "'t", " trivial", "."]
→ [9856, 2065, 2125, 470, 14276, 13]

You’ll hear it when

  • Estimating cost or latency for an LLM call.
  • Hitting a context window limit.
  • Wondering why the model can’t reverse a string or count letters.
  • Comparing tokenizer efficiency across languages (English vs. Thai vs. code).
  • Debugging a chunking strategy for RAG.

Related terms