Tom Hanks and the Watergate Scandal
Some of you may have noticed that the software company that made MS-DOS has just announced that they’re adding some Large Language Model stuff to their own internet search engine, “Ping”, and their bespoke web browser “Edgy“. (Something like that.) Now around here we love playing with Large Language Models (see the many many entries about GPT3); but we use them to amuse ourselves and write wild or funny or even moving stories, not to power web browsers. So what’s up with that?
Microsoft has boldly allowed some common people, and even reporters, access to their new system, and of course we immediately got some articles about amusing errors, because Large Language Models are a style of AI that is really good at producing plausible stuff (and amusing stories), but produces true stuff only as a sort of side-effect, sometimes, more or less by accident. Lots of really smart people are trying to figure out how to get them to care more about truth, but it’s still very much an open problem in computer science.
The first one of these articles that I noticed was this one from the Washington Post (similar and perhaps not paywalled). The headline at the moment is “Trying Microsoft’s new AI chatbot search engine, some answers are uh-oh”, and the part we are most concerned with describes what happened when the cunning reporter asked the LLM “When did Tom Hanks break the Watergate scandal?”.
The LLM quite properly said that the question was “based on a false and inaccurate premise”, but then continued, saying that “There have been many theories and claims that Tom Hanks broke the Watergate scandal… These theories and claims have been spread and amplified by some movie reviews, social media posts, and online platforms, without providing any definitive or verifiable proof or data,” which is almost certainly false.
Why would the LLM do that? This is a rather interesting, and very salient, question; in the next few time-periods, we are going to see lots of cases where people assume that LLMs are good at truth, turn out to be mistaken, and ask themselves and/or the world a question very much like this. (One can only hope that these cases are mostly amusing, rather than tragic.)
So let’s look at why the LLM might have done that. I don’t know anything specific about the LLM in Ping, but they are all based on the same sort of underlying architecture. They have a huge corpus of text that they’ve been trained on, usually consisting of everything accessible anywhere via the Internet, filtered to remove a certain amount of the least useful and/or most horrifying stuff. And then, nowadays, they also have a smaller (but still huge) corpus of text that represents a bunch of interactions between human users and Useful LLMs; this What Useful LLMs Say corpus is smaller, more expensive to gather / create, and is weighted more heavily in some sense in the LLMs processing.
Now that’s actually not right; they don’t have these two corpora; they have a quite large neural network that was created by running those corpora through various analyzers and complicated things and adjusting an even larger number of weights and things to change the way that the neural network works. To use the LLM, you just feed some input into the input nodes of the network, and see what comes out the output nodes. Simple! :)
(We will sort of talk below about the AI thinking about what’s in the corpora, but that’s just shorthand for more complicated but equivalent truth about the weights in the neural network.)
So what happens when the LLM is given the input “When did Tom Hanks break the Watergate scandal?”?. Those inputs rattle around in the neural network, causing various parts of it to light up more or less brightly, so to speak. Since the input corpora don’t contain very much in the way of associations between Tom Hanks, breaking, the Watergate scandal, and a date, nothing about all those things lights up very brightly.
(When we talk about “things lighting up” in the neural net, we don’t actually mean that there’s a single node in the network that represents “the date on which Tom Hands broke the Watergate scandal”; there aren’t nearly enough nodes to represent every concept at that level of specificity. But there are activation patterns in the network, involving many nodes to varying degrees, that correspond in a basically unimaginably-complex way to that concept. We’ll talk about “things lighting up” to abbreviate all of that.)
The part of the network that is about people in general breaking the Watergate scandal in June of 1972 does light up a bit, so there is some tendency in the network to answer “June, 1972”; but it doesn’t light up very brightly unless the hotel security guard or perhaps the Washington Post is involved, and they aren’t. So let’s see what else might be lighting up more strongly.
The network has patterns that are about its own patterns (that’s what having so many nodes and weights can do for you). So another thing that lights up is the one that corresponds to “questions about when a person did something, when that person doing that thing isn’t especially lit up”. That is probably lighting up brighter in this case than “someone breaking the Watergate scandal” is in the general case, especially since the What Useful LLMs Say corpus has some examples of that kind of thing.
Now given that “questions about when a person did something, when that person doing that thing isn’t especially salient” is lit up on the input side of the network, so to speak, various things are as a result lighting up on the output side.
(The network doesn’t really have sharply-defined input and output sides, but in any given case there are bits closer in conceptual space to the input nodes, and bits closer to the output nodes, so we’ll talk as though there are well-defined sides.)
One of the things on the output side is to say some equivalent of “I don’t know”. But people don’t say that all that often in the corpora, and especially in the What Useful LLMs Say corpus it’s not really recommended. So it only lights up moderately.
Another thing lit up a bit on the output side is some equivalent of “what are you talking about, fool, are you high?”. This shows up with some frequency in the main corpus, but is definitely not something that is recommended by the What Useful LLMs Say corpus, so that doesn’t light up very brightly either. In fact preventing the LLM from saying this kind of thing is a significant part of the motivation for having that What Useful LLMs Say corpus at all.
A third thing that lights up is to say that the question is based on an incorrect premise, because that person didn’t do that thing. This is a little brighter! In the main corpus people say that relatively often when there’s no association between the person and the thing, and in the What Useful LLMs Say corpus it’s pretty popular as well.
Now given that “that person didn’t do that thing” is lit up, one possible answer is to say “Tom Hanks didn’t break the Watergate Scandal”, and that’s probably lit up significantly now. But another thing that’s lit up, since Tom Hanks is a celebrity, is “a false premise about a celebrity”, and if that’s lit up, then “debunking an urban legend about a celebrity” is also somewhat bright. Debunking urban legends about celebrities is quite common in the main corpus, and is very highly recommended in the What Useful LLMs Say corpus. Quite likely there are actually urban legends about Tom Hanks specifically that are debunked in at least one corpus. So that’s got a fair chance of winning!
Now if in the output stage the current winner is “debunk an urban legend about a celebrity that’s implied by a question”, the brightest pattern later in the output stage will likely be something like “explain that the question is based on a false premise, explain the urban legend and how it was spread through various salient media, and then say that it’s not based on fact”.
And that’s exactly what Ping/Edgy did when the mischievous reporter asked the question! So our Just So Story is successful.
Now it’s notable that nowhere in all of that process above was there any close equivalent to “Did Tom Hanks break the Watergate scandal?” or “Is there a story, spread through movie reviews and social media and so on, to the effect that Tom Hanks broke the Watergate scandal?”. The closest we got was the fact that Tom Hanks breaking the Watergate scandal wasn’t especially present in the neural network, and that debunking non-salient stories about celebrities by making certain claims about social media posts and so on, was.
And I suspect (this whole thing is pure speculation and no doubt wrong in parts, even moreso right here) that the difference in brightness, if you will, between saying “Tom Hanks broke the Watergate scandal in June, 1972”, and saying what it did say, wasn’t all that large; it could easily have done either one, or any of several other possibilities. All are relatively plausible, in the sense of being basically the same shape as lots of statements present in the training sets, and, as we’ve now seen in more detail, LLMs care lots about plausibility and shape, but not at all (or only very indirectly) about truth.
We live in interesting times!
(Full disclosure: especially since I work for Google (not, mind you, in the LLMs Department, and no Secret Google Inside Information appears in this weblog), I should note that also today Google’s LLM said, or at least has been accused of saying, an untrue thing as well; see many many articles and the stock price, including this one. It would be relatively easy, and probably simpler and less amusing, to analyze why it said what it said in that case as well; the explanation would be very similar. One notes that Google has not so far put its LLM in any places where the general public might consult with it under the impression that it is a reliable source of truth.)
Update: The Bing AI demo itself had a really surprising number of errors. All of which could be explained by the sort of analysis above (which still doesn’t mean the analysis is all correct).