stupid layout fix

Understanding LLM Tokens: A Visual Playground

October 5, 2025

When you send a message to an LLM like Claude or ChatGPT, it doesn't process your text directly. It breaks it into smaller pieces called tokens. Understanding tokens is key to understanding how LLMs work, but the concept can be abstract. That's why I built a simple playground to make it tangible.

Try it here: LLM Tokens Playground

What Are Tokens?

Tokens are the basic units that LLMs work with. When you send "How are you today?" to an LLM, it sees something like this:

  • Token strings: ["How"," are"," you"," today","?"]
  • Token IDs: [5299, 553, 481, 4044, 30]

Notice how "are" includes a space before it - that's because tokens often capture spacing and punctuation as part of the text fragments.

Why Tokens Matter

The LLM doesn't actually see your text. It converts everything to token IDs - integers that represent pieces of text. The model processes these numbers, and when generating a response, it outputs token IDs that get converted back to text.

This matters because:

  • LLMs have token limits (like 200k tokens), not character limits
  • Understanding tokens helps you estimate costs and context usage
  • Some behaviors make more sense when you know how text gets split

The Playground

The LLM Tokens Playground lets you type any text and instantly see:

  • How it's broken into token strings
  • The corresponding token IDs
  • A side-by-side table mapping each ID to its text fragment

It uses the same tokenizer (o200k_base) that many modern LLMs use, so you're seeing real tokenization.

Try It Yourself

Type different kinds of text to build intuition:

  • Regular sentences
  • Code snippets
  • Numbers and special characters
  • Words in different languages

You'll notice patterns - common words are often single tokens, while unusual words get split into smaller pieces. This is because tokenizers learn from massive text datasets and optimize for common patterns.

The playground is open source and built with Next.js and the js-tiktoken library.

Learn More

If you're working with LLMs, understanding tokens will help you write better prompts and understand model behavior. Give the playground a try and see how your text gets tokenized.