scorecardresearch Skip to main content

How our ‘Boston Globe’ led to ChatGPT

The name of your favorite newspaper is the kind of thing that used to trip computers up.

The Globe building in 1958.United Press Photo

The Boston Globe played a crucial role in the prehistory of ChatGPT. Not the newspaper itself but the name of the publication. This strange fact sheds light on how AI produces language.

“Boston Globe” is a phrase I understand immediately. But someone just learning English might trip on it if they haven’t heard of the newspaper. Boston is a place on the globe, after all. There can’t be a “Boston” version of the globe. The individual words in the phrase don’t add up to the overall meaning. There are lots of phrases like this in English, such as “Toronto Maple Leafs” and “Air Canada.” You just have to know that “Boston Globe” is a newspaper, that the Maple Leafs are a hockey team, and that Air Canada is an airline.


For a long time, AI systems have struggled with phrases like these, because an algorithm can “know” the definition of the words in the phrase without arriving at the correct meaning of the phrase as a whole. In 2013, Google scientists cracked the code and solved this problem.

Rather than tell the computer that “Boston Globe” is a newspaper so it would know in advance, they let the machine learn that these two words were associated in some contexts. By simply learning how often those words occur together, the machine’s algorithms suddenly were able to process the phrase “Boston Globe” as a unit rather than as its individual words — all without truly knowing what either word, or the whole phrase, meant at all.

For the longest time, engineers had tried to figure out how to get knowledge into machines, and it turned out that wasn’t necessary. The machines just needed to be able to calculate how likely one word was to keep company with another word.

All three of the phrases I’ve mentioned — “Boston Globe,” “Toronto Maple Leafs,” and “Air Canada” — were among the ones used to test whether or not their algorithm was really “learning language.” Of course, ChatGPT was trained on massive amounts of language, over a trillion words. But along the way, “Boston Globe” became a kind of benchmark for AI’s language capacity.


Before Google’s 2013 breakthrough, computers and algorithms couldn’t really do much with language like this. But the chatbots that are overwhelming the internet today adeptly handle language using this advance as well as some other techniques developed in the interim. And they do it in a way that lets them figure out why the phrase “Air Canada” isn’t about weather conditions north of the border.

Paradoxically, AI learns language without necessarily knowing what it means. A phrase like “Candlestick Park” — where the San Francisco Giants and 49ers used to play — isn’t a “natural combination” of words, as the AI scientists put it. Confronted with that phrase, it’s actually an advantage for the machine not to “know” in advance what it means. Truly knowing what “candlestick” and “park” describe would trip the algorithm up when those words are put together. Just like there can’t be a “Boston” version of the globe, there’s no obvious way in which a park can also be a candlestick. It’s better to just learn the two words as a chunk of language.

ChatGPT can go far beyond a couple of words: It can put many words in order and make meaningful sentences, essays, and even poems. If you prompt the system to give you a list of major daily newspapers, it will give you the Globe right away. It’s not looking for meaning. What it is doing is cross-referencing the likelihood of various words in combination, and “Boston” very often sits next to “Globe” when the topic is newspapers.


This isn’t intelligence, which is why the panic that tools like ChatGPT could lead to human extinction is overblown. But ChatGPT’s word-prediction capabilities are legitimately impressive, which is what makes the software more than a toy. Because once a machine can use language like that, it’s not long before it can start making analogies: Boston is to Globe as New York is to Times. It’s hard to resist the creepy feeling that comes over you when the chatbot gets that right. Putting the right word next is really hard to distinguish from intelligence itself.

This explains why AI can seem extremely lifelike — even appearing to reason or solve logic puzzles — even though it also has a tendency to reproduce our biases and stereotypes and has a hard time distinguishing cause from effect. Recently, it advised people with eating disorders how to make their condition worse. These predictive failures are called “hallucinations,” bits of misinformation that some suggest will begin running rampant as AI systems become omnipresent in our society. Hallucination is the price of learning how to use surprising phrases like “Air Canada.”


Since our information channels are already flooded with spam, scams, misinformation, and conspiracy theories, we’re going to have to get really good at discerning the difference between machine language and human expression. All this because a machine learned to how to handle “Boston Globe.”

Leif Weatherby is associate professor of German and director of the Digital Theory Lab at New York University.