Several Apple researchers have confirmed what had been previously thought to be the case regarding AI—that there are serious logical faults in its reasoning, especially when it comes to basic grade school math.
According to a recently published paper from six Apple researchers, 'GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models', the mathematical “reasoning” that advanced large language models (LLMs) supposedly employ can be extremely inaccurate and fragile when those methods are changed.
The researchers started with the GSM8K's standardized set of 8,000 grade-school level mathematics word problems, a common benchmark for testing LLMs. Then they slightly altered the wording without changing the problem logic and dubbed it the GSM-Symbolic test.
The first set saw a performance drop between 0.3 percent and 9.2 percent. In contrast, the second set (which added in a red herring statement that had no bearing on the answer) saw "catastrophic performance drops" between 17.5 percent to a massive 65.7 percent.
What does this mean for AI?
It doesn’t take a scientist to understand how alarming these numbers are, as they clearly show that LLMs don’t properly solve problems but instead use simple "pattern matching" to "convert statements to operations without truly understanding their meaning." And if you slightly change the information found in those problems, it majorly interferes with the LLMs’ ability to recognize those patterns.
The main driving force behind these current LLMs is that it’s actually performing operations similar to how a human would, but studies like this one and other ones prove otherwise — there are critical limitations to how they function. It’s supposed to employ high-level reasoning but there’s no model of the logic or world behind it, severely crippling its actual potential.
And when an AI cannot perform simple math because the words are essentially too confusing and don’t follow the same exact pattern, what’s the point? Are computers not created to perform math at rates that humans normally can not? At this point, you might as well close down the AI chatbot and take out your calculator instead.
It’s rather disappointing that these current LLMs found in recent AI chatbots all function on this same faulty programming. They’re completely reliant on the sheer amount of data they horde and then process to give the illusion of logical reasoning, while never coming close to clearing the next true step in AI capability — symbol manipulation, through the use of abstract knowledge used in algebra and computer programming.
Until then, what are we really doing with AI? What’s the purpose of its catastrophic drain on natural resources if it’s not even capable of what it has been peddled to do by every corporation that pushes its own version of it? Having so many papers, especially this one, confirming this bitter truth makes the whole endeavor truly feel like a waste of time.