How do LLMs work?

A step-by-step guide for engineers who have used large language models and want to deeply understand the mechanics of what's happening inside.

From a toy model to a production LLM, across three milestones.

Start here

You'll need a basic understanding of vectors, matrices, and Python.

Pedagogical LLM


Build a working language model from scratch — no framework, no pretrained weights. You will trace every number through every operation.

Local LLM


Add the components that make modern models actually work: attention, efficient positional encoding, the MoE layer. By the end you can read a real model config.

Server LLM


The engineering behind production deployment — FlashAttention, KV cache management, multi-GPU serving, and the full training and post-training pipeline.

Your progress

0 / 93