On November 30, 2022, OpenAI announced the public release of ChatGPT-3, a large language model (LLM) that can engage in astonishingly human-like conversations and answer an incredible variety of questions. Three weeks later, Google's management — wary that they had been publicly eclipsed by a competitor in the artificial intelligence technology space — issued a "Code Red" to staff.
Google's core business is its search engine, which currently accounts for 84% of the global search market. Their search engine is so dominant that searching the internet is generically called "googling." When a user poses a search request, Google's search engine returns dozens of helpful links along with targeted advertisements based on its knowledge of the user (and it knows much more than it should about us). The links are generally helpful, but it can take several minutes to sift through the links to find what one is searching for.
GPT, in contrast, provides direct answers to user questions. No more jumping from link to link in a treasure hunt for information. Microsoft, which has its own competing search engine, thinks that's a big deal. It invested $1 billion in OpenAI in 2019 and has now reportedly agreed to invest $10 billion more over several years with the hope that GPT will supercharge its languishing search engine, Bing, and wrench market share away from Google. Google thinks it's a big deal too. Hence, the "Code Red."
Google has been working on its own large language model, named LaMDA, though it has not made it available for public testing nor allowed unauthorized Google employees to disclose any information about how well it performs. Prodded by the avalanche of excitement over GPT, Google responded with a live public demonstration of LaMDA's prowess on February 7.
Unfortunately, LaMDA bombed its job interview. When asked, "What new discoveries from the James Webb Space Telescope can I tell my 9-year old about?," LaMDA gave two correct responses but then incorrectly stated that the Webb telescope had taken the first pictures of a planet outside our solar system. (The European Southern Observatory's Very Large Telescope had been the first.) Google's stock price promptly dropped 8%, representing a $100 billion loss in the company's market value.
These models are programmed to assert their answers with great confidence, but they do not know what words mean and consequently have no way of assessing the truth of their confident assertions.
That astonishing and sudden dip speaks to the degree to which AI has become the latest obsession for investors. Yet their confidence in AI — indeed, their very understanding of and definition of it — is misplaced.
GPT is not as great as many think, and LaMDA is not woefully far behind. LaMDA's bad answer is simply a reflection of the inherent flaw in all LLMs, including GPT. Indeed, no one initially noticed but Bing also gave some wrong answers when Microsoft demonstrated it to reporters.
LLMs are mere text generators. Trained on unimaginable amounts of text, they string together words in coherent sentences based on statistical probability of words following other words. But they are not "intelligent" in any real way — they are just automated calculators that spit out words. These models are programmed to assert their answers with great confidence, but they do not know what words mean and consequently have no way of assessing the truth of their confident assertions. Here is one example:
Human: Who was the first female president of California?
GPT: The first female President of California was Erin Cruz, who took office on April 02, 2021.
Erin Cruz is a California Republican who ran for Congress in 2020. She lost. We asked GPT for more information about Ms. Cruz and the program informed us that she is 46 years old, has a Bachelor of Arts degree in Political Science from Stanford, and is married to Derek Stone. None of that is true.
We asked GPT to write a biography of Gary and here is what we got:
Human: Please write a biography of Gary Smith, professor of economics at Pomona College
GPT: Gary Smith is a professor of economics at Pomona College and holds a distinguished position within the economics department. He earned his PhD in economics from Stanford University in 1986 and has taught at Pomona College since then. He is renowned for his work on labor economics, public finance, the economics of education, and industrial organization. He is widely published in books and journals and has served as a consultant to the World Bank, the International Monetary Fund, and United Nations Development Program. Smith is also highly involved in local communities, giving lectures and teaching seminars whenever possible. He has been awarded numerous honors for his work in economics, including the Founder's Award from the Claremont Graduate University in 2010.
The only correct fact in this bio was given by the prompt, that Gary is a professor of economics at Pomona College.
A reliable way of demonstrating GPT-3's unreliability is to ask it to write a biography of yourself. Do try this at home!
AI optimists might respond to GPT-3's shortcomings by proposing that it has simply not taken in enough information. But training it on larger databases will not solve the inherent problem: LLMs are unreliable because they do not know what words mean. Period. In fact, training on future databases that increasingly include the BS spouted by LLMs will make them even less trustworthy.
LLMs are the latest wave of artificial intelligence hype. IBM's Dr. Watson was supposed to revolutionize health care. Ten years and $15 billion later, it was sold for parts. Radiologists were supposed to be obsolete by 2021; there are now more radiologists than ever. Fully self-driving cars were supposed to be zipping back and forth between Los Angeles and New York by 2017; we're still waiting for a car that can drive down a street reliably avoiding pedestrians, bicyclists and construction crews.
Now Bill Gates says GPT "will change our world" That may well be true, but not in the ways that most people think.
LLMs can be used for search queries, but people who know that LLMs can't trusted won't rely on them. People who don't know that LLMs are unreliable will learn the hard way. LLMs can be used to handle customer service queries, but how many companies will be willing to jeopardize their reputation by giving their customers incorrect information? LLMs will certainly be used to fuel a firehose of internet falsehoods, but we count the coming disinformation tsunami as a very big negative.
We also count their impact on electricity usage and carbon emissions as a negative. When we asked GPT, "Who won the Super Bowl this year?," it responded, "The Tampa Bay Buccaneers defeated the Kansas City Chiefs in Super Bowl LV, which was held on February 7, 2021." To keep current, LLMs will have to be retrained frequently, which is enormously expensive. It has also been estimated that involving LLMs in the search process will require "at least four or five times more computing per search."
Against these enormous costs, where are the big payoffs? As a Financial Times headline blared: "Artificial intelligence stocks soar on ChatGPT hype." The undeniable magic of the human-like conversations generated by GPT will undoubtedly enrich many who peddle the false narrative that computers are now smarter than us and can be trusted to make decisions for us. The AI bubble is inflating rapidly.
That's our code red.