AI & Tech

How Large Language Models Actually Work

SQ

SnackIQ Editorial Team

AI & Tech

Feb 16, 2026

schedule4 min read

Abstract neural network visualisation — how large language models and transformers work
AI & Tech4 min read

You've used ChatGPT, Gemini, or Claude. But what's actually happening inside? Large Language Models are frequently described as 'predicting the next word' — which is technically true but profoundly undersells what that actually means. Understanding the mechanism explains both their extraordinary capabilities and their fundamental limitations.

The transformer architecture

In 2017, Google researchers published 'Attention is All You Need' — one of the most cited papers in machine learning history. It introduced the transformer, the architecture behind every major LLM. The key innovation was the attention mechanism: instead of processing words sequentially (as earlier RNNs did), transformers process all words in a sentence simultaneously and learn which words should 'attend' to which others. 'The bank was steep' and 'I went to the bank' need to be understood differently — attention makes this possible by weighing word relationships across the entire context window. The eight researchers who wrote the paper have since founded or joined most of the major AI labs; the architecture they invented underlies GPT-4, Claude, Gemini, and every other frontier model.

What 'training' actually means

Training an LLM involves exposing it to hundreds of billions of words from the internet, books, and code, and repeatedly asking it to predict the next token. Every time it gets a prediction wrong, the model's billions of parameters — the numerical weights determining its outputs — are adjusted slightly. After trillions of such adjustments, the model has implicitly encoded vast amounts of knowledge about language, facts, and reasoning. The result isn't a database of facts — it's a compressed statistical model of human language. OpenAI's GPT-4 has an estimated 1.8 trillion parameters, each a small number representing a learned relationship between concepts. The training compute for frontier models now costs hundreds of millions of dollars.

Why LLMs hallucinate

LLMs don't retrieve facts from a database — they generate plausible next tokens. This means they can produce confident-sounding text that is factually wrong, because 'sounds like what a correct answer would look like' is what they were trained to produce, not 'is verifiably true'. This isn't a bug to be fixed; it's an intrinsic property of the architecture. Researchers at DeepMind and elsewhere have demonstrated that hallucination rates decrease with model scale and with retrieval-augmented generation (RAG) — connecting the model to live databases of facts it can quote directly rather than generating from memory. But zero hallucination in a generative model is not achievable with current architectures.

RLHF: making models useful and safe

Raw pre-trained LLMs are good at predicting text but not at following instructions or being helpful. Reinforcement Learning from Human Feedback (RLHF), developed at OpenAI and popularised with InstructGPT in 2022, addresses this. Human raters compare model outputs and rank them, training a 'reward model' that predicts human preferences. The LLM is then fine-tuned to maximise this reward signal. This process is what transforms a raw text predictor into a helpful assistant. It's also how safety constraints are implemented: raters mark certain outputs as unacceptable, and the model learns to avoid them. Every instruction-following model — including Claude — uses some variant of this process.

What LLMs genuinely cannot do

Understanding the architecture explains the limitations. LLMs have a fixed context window — they can only process a certain amount of text at once, and have no persistent memory between conversations (unless specifically engineered). They cannot access real-time information without tool use. They are susceptible to adversarial inputs ('prompt injection') that cause them to ignore their instructions. They cannot reliably perform multi-step mathematical reasoning without assistance — research from MIT and Stanford has consistently found that even frontier models make errors on arithmetic that primary school children can solve. Knowing this isn't a counsel of despair — LLMs are genuinely useful — but matching tasks to the architecture's genuine strengths requires understanding what those strengths are.

format_quote

An LLM doesn't know anything — it has learned the patterns of language well enough that it can produce text indistinguishable from knowing. That distinction matters enormously.

lightbulb

Pro tip

LLMs perform best when you give them clear context and constraints. They're prediction machines — better context means better predictions. Tell them who you are, what you want, and in what format.

LLMs are simultaneously more limited and more impressive than the hype suggests. They don't think — but their ability to manipulate language and knowledge in useful ways is genuinely unprecedented. Understanding the mechanism helps you use them better and be appropriately sceptical when they fail.

SQ

SnackIQ Editorial Team

AI & Tech · SnackIQ

Share this snack

Frequently Asked Questions

What is the difference between GPT, Claude, and Gemini?expand_more
They are competing implementations of the same underlying transformer architecture, trained by different companies on different data with different RLHF processes. GPT is built by OpenAI, Claude by Anthropic, and Gemini by Google DeepMind. Their differences are primarily in training data composition, safety training approach, context window size, and the fine-tuning choices each company made. At the architectural level, they are more similar than different.
Why do AI models make up facts that sound convincing?expand_more
Because they are trained to predict what plausible text looks like, not what is factually true. There is no database of facts they query — they generate token-by-token based on statistical patterns in training data. A confident-sounding wrong answer and a confident-sounding right answer look structurally identical from the model's perspective. This is called hallucination and is an intrinsic property of the generative approach, not a bug that will simply be fixed with more compute.
How do I use AI tools more effectively?expand_more
Treat them as a first draft, not a final answer — especially for facts, citations, and anything verifiable. Provide detailed context: who you are, what you need, in what format, for what purpose. Use them for tasks with clear outputs (writing, summarising, explaining, coding) where errors are catchable. Be sceptical of confident-sounding specific claims and verify anything important independently. The single biggest mistake is treating LLM outputs as authoritative sources rather than starting points.

You might also like

Ready to snack on knowledge?

Join learners who are growing smarter every day with SnackIQ. Start free today.