Artificial Fintelligence

Artificial Fintelligence

Share this post

Artificial Fintelligence
Artificial Fintelligence
Why do LLMs use greedy sampling?
Copy link
Facebook
Email
Notes
More

Why do LLMs use greedy sampling?

"Greedy sampling is the worst form of sampling, except all those other forms that have been tried from time to time." - Winston Churchill, if he worked in NLP.

Finbarr Timbers's avatar
Finbarr Timbers
Oct 17, 2023
∙ Paid
13

Share this post

Artificial Fintelligence
Artificial Fintelligence
Why do LLMs use greedy sampling?
Copy link
Facebook
Email
Notes
More
4
Share

Why do LLMs use greedy sampling?

This is a speculative article, so I’d greatly appreciate any feedback, particularly if you disagree.

When I began working on LLMs, I found it pretty surprising that the SOTA in generative text was to greedily sample from the bare outputs of the neural networks. In other words, with GPT-style models, the typical approach to generate a sequence of text is something like:

  1. Run your prompt through your model and generate probabilities over your vocabulary.

  2. Choose the most likely token (perhaps with some randomization, maybe with some preprocessing, like top-k or nucleus sampling).

  3. If the chosen token is <|endoftext|>, you’re done; otherwise, concatenate the new token to your prompt and go back to 1.

In games RL research, it is common to instead conduct a much more complicated calculation to choose the next step in your sequence. For instance, AlphaZero uses a somewhat complicated algorithm called Monte Carlo Tree Search (MCTS). Here, I explore some reasons for why LLMs don’t use a fancier decoding algorithm.

But first, a caveat: there’s a lot of literature proposing various ways to do this that I’m not going to engage with, for sake of time. I have a list of references at the end which I’d encourage you to look at if you want a more detailed look.

The current paradigm of language modelling, with GPT-style decoder models, uses greedy autoregressive sampling to generate a sequence of tokens. This is a somewhat surprising choice; if you look at the history of NLP research, particularly the Neural Machine Translation literature, beam search is often needed to reach SOTA performance (e.g. 1703.03906). Similarly, in games research, search is typically many times stronger than any pure neural network approach, and search will strictly dominate wherever it’s feasible (the exceptions are games like Stratego, where the game tree has much too high of a branching factor to be searched with any non-trivial depth). In games like Go, Chess, Poker, or Scotland Yard, search methods dominate.

By search, I am referring to algorithmic search, which I am defining as any method which uses additional compute at inference time to improve the answer. This has nothing to do with Google-style search (which I call “information retrieval”).

So why don’t GPTs use search? Well, there’s a few answers to this. The first one is a total copout:

  1. GPTs don’t use search as far as we know. OpenAI recently raised my eyebrows when they hired Noam Brown, an expert on search in games, to work on “multi-step reasoning, self-play, and multi-agent AI.” That sounds an awful lot like search to me (and, specifically, sounds a lot like Alpha/MuZero). We also know that Demis has talked about Gemini incorporating techniques from AlphaGo, which, again, makes me think about search (not to mention self-play).

    So it’s entirely possible that search is the secret sauce behind GPT-4’s performance, and the lack of it is why the open source world has been unable to match it. I’m suspicious of this— for reasons I’ll get into below— but if I were actively working on LLM research, I’d be focusing on trying to use search.

Let’s assume, then, that the key players (GPT-4, Claude, etc.) aren’t using search. Why?

Keep reading with a 7-day free trial

Subscribe to Artificial Fintelligence to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Finbarr Timbers
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More