Are LLMs Partial Lookup Tables?

2025/04/29

Many of you might be familiar with Searle's Chinese room thought experiment.

If not, here's the basic idea.

A person is sitting in a room which has an opening, through which sometimes the outside world deposits pieces of paper, full of Chinese characters. Now, our person doesn't actually understand Chinese; fortunately though, the room also contains a copious amount of notes, describing sequences of characters that could come in & instructions for the required responses to them.

No actual explanation is given though. They look like... "if you see the characters '你好吗?', you respond with '我很好'".

Our person is definitely not going to learn Chinese from this especially efficiently.

The entire point of the thought experiment was to evoke some inner conflict: the person clearly doesn't understand what's going on, the room is, well, a room, so it doesn't either; who or what exactly understands Chinese in this system?

The setup can be used as an argument for symbol manipulation not being equivalent to true understanding. Meanwhile, critics argue that the room, the system itself (including the human symbol-manipulator) is the one that speaks Chinese.

But... does it really?

Would said room, system or not, be capable of learning new words? Generalize to new situations?

Or is it just a rigid, static image of someone's mind who actually understands what's going on?

It might be a spectrum

On one end, there is a pure lookup table. In goes the entire context of the conversation; there is one lookup, of "what do we do if we are at this precise point in the conversation", out goes the one, deterministic, pre-written answer... which answer, nevertheless, still makes sense, since someone who understands things took the time to write down a response for every. single. one. of the inputs that could ever happen.

(One might question the feasibility of this, given the comparatively meager number of atoms in the universe; it's a thought experiment for a reason though. Also, you can totally do it if you replace "every possible conversation" with "every possible combo of two-digit numbers you might ever think of multiplying".)

Meanwhile, on the other end, there is the Human Soul with its Incomparable Gift of Consciousness that Just Sees Things.

C++ code, being a partial copy of human souls, is somewhere in between.

So are the kids with excellent training to accumulate streams of single-digit numbers while taking just a couple hundred of milliseconds for this.

And now that we have actual computers you can talk to and that do seem to understand what you're saying....

... where are they on this line?

Parrot Theory

You will surely find people who will say: "very close to a human".

After all, have you tried talking to one? They are smart, they are funny, they can solve math and programming problems better than 90% of humanity, they can reason about things, what more do you need to declare them intelligent?

They might not be perfect and especially well-rounded, but this is clearly intelligence.

Meanwhile, you could also argue that they are just interpolating between the immense amount of training data points that they have seen. They don't really understand a lot of things that they seem to be competent with! It's just... there was, once upon a time, something similar on the internet; not quite as recognizable as a search engine result, but still not especially too far. They couldn't come up with all these insights alone: they need all the human work that was once put into it.

As such, they will get stuck at the level of human achievement.

To bring up an example that does not work like this, take the game of Go. We started training AlphaGo the way we did train LLMs: on huge databases of games, originally played by humans. But then we switched over to... not even using any historical, human-derived input: AlphaZero started from, well, zero, and yet, just by playing against itself, surpassed the level of play that humanity could achieve, without even looking at what we did. It clearly understands what's going on in this game.

You could not (currently) bootstrap an entire human-level civilization just by launching a few tens of thousand lines of code on a big machine with a GPU, and then... waiting a lot.

It is understandable why. For example, language models only interact with text. They have never really seen three-dimensional objects, so all their knowledge about them is just a mere lookup table, based on shadows of activities once done by a human visual cortex. They might figure out some regularities in how humans talk about colors, for example, but they will never have the visceral feeling of seeing the color red. They will have about as much intuitive understanding of this as humans of quantum mechanics.

This is, by the way, the problem of "symbol grounding", long unsolved (unsolvable?) by AI technology. You can call your symbol "red" in the code, but nothing will connect your mere name to the redness in the real world, so...

... well, unless you take a picture and feed it to your tokenizer? Like many multimodal models do these days? So that they will most definitely know what the difference between "red" and 🔴 is?

Well yeah, that will do it.

Anyway... this might have been the one unfortunate example that got solved oddly quickly. But: there are still a lot of areas where their seeming proficiency is thanks to us doing a lot of work first. We're still a lot better playing Pokémon games, apparently; they're terrible at planning. And even in the visual realm: yes, the input is not just text now, but even the pictures you find on the internet contain valuable information that would otherwise be hard for them to obtain. They don't yet have a complete model of reality: they understand some of it & copy the rest of it from us so well that we don't even notice.

They're part understanding, part lookup table.

... but... aren't we all?

At least those of us who read the symbols of a book about quantum mechanics, instead of just... gaining an intuitive understanding of electron orbitals at the age of 4 during that random beach trip?

("should be obvious, really? have you ever built a sand castle? it's... just look! You still don't? Look, now with the salt water? How can you still not???")