Understanding LLM Tokens: A Visual Playground

October 5, 2025

When you send a message to an LLM like Claude or ChatGPT, it doesn't process your text directly. It breaks it into smaller pieces called tokens. Understanding tokens is key to understanding how LLMs work, but the concept can be abstract. That's why I built a simple playground to make it tangible.

Try it here: LLM Tokens Playground

What Are Tokens?

Tokens are the basic units that LLMs work with. When you send "How are you today?" to an LLM, it sees something like this:

Token strings: ["How"," are"," you"," today","?"]
Token IDs: [5299, 553, 481, 4044, 30]

Notice how "are" includes a space before it - that's because tokens often capture spacing and punctuation as part of the text fragments.

Why Tokens Matter

The LLM doesn't actually see your text. It converts everything to token IDs - integers that represent pieces of text. The model processes these numbers, and when generating a response, it outputs token IDs that get converted back to text.

This matters because:

LLMs have token limits (like 200k tokens), not character limits
Understanding tokens helps you estimate costs and context usage
Some behaviors make more sense when you know how text gets split

The Playground

The LLM Tokens Playground lets you type any text and instantly see:

How it's broken into token strings
The corresponding token IDs
A side-by-side table mapping each ID to its text fragment

It uses the same tokenizer (o200k_base) that many modern LLMs use, so you're seeing real tokenization.

Try It Yourself

Type different kinds of text to build intuition:

Regular sentences
Code snippets
Numbers and special characters
Words in different languages

You'll notice patterns - common words are often single tokens, while unusual words get split into smaller pieces. This is because tokenizers learn from massive text datasets and optimize for common patterns.

The playground is open source and built with Next.js and the gpt-tokenizer library.

Learn More

If you're working with LLMs, understanding tokens will help you write better prompts and understand model behavior. Give the playground a try and see how your text gets tokenized.