Microsoft's demo of Bing AI also made factual errors

Microsoft executive Yusuf Mehdi next to a presentation showing OpenAI's logo. (Credit: Jason Redmon—AFP via Getty Images)

After a single mistake in Google’s demo of its new A.I. program erased billions of dollars in market value, employees complained that the company had rushed out the news to get ahead of Microsoft’s own A.I. announcement a day later.

Except it turns out that Microsoft’s demo also contained answers that were incomplete, confusingly sourced and, at worst, entirely incorrect, according to an analysis published by independent A.I. researcher Dmitri Brereton.

These errors highlight a growing problem with A.I. chatbots like OpenAI’s ChatGPT and Google’s Bard: Both employees and users seem to trust that these bots’ well-written and conversational answers are accurate, meaning mistakes at a public event might go unnoticed for days.

Wrong answers

During Microsoft’s demo of its new chatbot, aired last week, the company’s corporate vice president for search, Yusuf Mehdi, asked the program for “key takeaways” from Gap’s most recent earnings report. The bot obliged with a series of bullet points with the company’s key financial data and even a comparison with fellow clothing company Lululemon’s most recent earnings.

Yet Brereton found that Microsoft’s bot gave incorrect figures. For example, Bing's A.I. said that Gap’s operating margin, adjusted for impairment costs, was 5.9%. According to Gap’s earnings report, the company’s adjusted operating margin for the recent quarter was 3.9%. The unadjusted margin was 4.6%.

Bing AI also said that Gap was forecasting sales growth in the low-double-digits in the coming quarter. In fact, Gap is projecting a decline in net sales in the “mid-single digits.”

The demo also made mistakes when it came to Lululemon’s financial data. For example, Bing's A.I. reported that the clothing company’s operating margin in the most recent quarter was 20.7%. The company’s earnings report shows an adjusted operating margin of 19.4%.

Further analysis from CNN uncovered that Bing's A.I. would attribute its answers to sources that did not contain the information in question.

Microsoft hopes that programs like ChatGPT, which can generate conversational answers in response to text prompts, can undercut Google’s dominance in search. The company is investing $10 billion in ChatGPT’s developer, OpenAI.

Microsoft did not immediately respond to a request for comment. The tech company previously told Fortune that “the system may make mistakes during this preview period, and user feedback is critical to help identify where things aren’t working well,” in response to an earlier request for comment on strange responses from the Bing chatbot.

Mistakes with Google’s Bard

Microsoft competitor Google was recently hit hard by the revelation that Google’s A.I., titled Bard, made a factual error in its demo.

In a blog post, Google included a video of a user asking the Bard A.I. for interesting facts about the James Webb Space Telescope, launched in 2021. Google’s A.I. claimed the telescope was the first to discover a planet outside of our solar system. In fact, the first exoplanet was discovered by the Very Large Telescope array in Chile in 2004.

Shares in Alphabet, Google’s parent company, crashed 8% after the error was first reported by Reuters, wiping around $100 billion of the company’s market capitalization.

At a conference Monday, Alphabet chairman John Hennessey said that these kinds of mistakes were why the company hesitated to announce its own ChatGPT competitor. “You don’t want to put a system out that either says wrong things or sometimes says toxic things,” he said, according to CNBC.

Hallucinated answers

People testing these chatbots are discovering that certain prompts can lead to strange results, including instances where the bot responds with argumentative or “unhinged” answers. Users of Bing's A.I. are now compiling a repository of instances of “failure cases” to help with “further study.”

Tech leaders like Apple co-founder Steve Wozniak and businessman Mark Cuban have warned that generative A.I. can make mistakes and spread misinformation if used incorrectly. Those working in the field even have a term for A.I.-generated answers that appear entirely made up, calling them "hallucinations."

Even Vint Cerf, an early pioneer of the internet and a current vice president at Google, revealed that a chatbot got details of his biography wrong when asked. “We know it doesn’t always work the way we would like it to,” he said at a conference on Tuesday, according to CNBC.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here