Today’s generative AI is capable of amazing things. It can seemingly understand what we say and provide very humanlike responses to our queries. It can identify things in pictures, and come up with new images on command.
But even the latest large language models, like OpenAI’s o1, cannot inherently understand whether something is true. They still fabricate things—a phenomenon known as “hallucination”—and the way they arrive at decisions remains largely opaque.
Overcoming these limitations may mean turning to an old and seemingly outmoded idea, to create something new. That outmoded idea is symbolic AI. And the new thing is called neurosymbolic AI, which its advocates say blends the strengths of today’s LLMs with the explainability and reliability of this older, symbolic approach.
Neural vs. symbolic AI
The history of artificial intelligence is one of an almost sectarian struggle between opposing approaches to solving the challenge of creating machines that could learn and “think” like people. The field fractured along two fault lines predominantly (although there were a few smaller factions and sub-factions, too).
In the one corner, we have symbolic AI, also known as “good old-fashioned artificial intelligence.” This approach, based on formal logic and rules and human-readable representations of concepts, began to take off in the 1950s, but largely reigned from the 1960s until it hit a dead end in the 1990s. Symbolic AI helped create the chess-playing computers that culminated in Deep Blue, the IBM system that defeated world champion Garry Kasparov in 1997, and it also was behind the first chatbots, and the “expert systems” that were popular in the 1980s. Symbolic AI is a top-down approach where the computer is given a set of rules—written by humans—and then must learn how to apply those rules to specific examples or circumstances.
In the other corner: brain-like neural networks that can be trained to identify patterns in language and images, and to use statistics to predict things. A neural network learns in a bottom-up way: It takes in a large number of examples while being trained and from the patterns in those examples infers a rule that seems to best account for the patterns it has discerned in the data. But that rule is not something that can be easily written down—even as a mathematical formula. It exists as the sum of the outputs of all the nodes of the entire network; the network itself is essentially the rule.
The neural approach showed early promise in the 1950s, had a brief resurgence in the 1980s, and then endured a long winter until taking off again with a series of major breakthroughs in the 2010s. (Around this time, the approach was also rebranded as “deep learning,” a reference to the fact that the many-layered neural networks being used had “depth,” but also a clever bit of marketing to make the method seem somehow more profound than other machine learning techniques.)
Deep learning has been placed on a high altar in recent years, especially because of its application in the large language models (LLMs) that power today’s generative AI. However, LLMs often make things up because they aren’t able to truly understand concepts or to reason logically. They are also “black boxes”—even their developers don’t fully understand how they arrive at their conclusions.
That makes generative AI too untrustworthy for many users, particularly businesses and governments. And if AI is to evolve into some kind of human-level or superhuman intelligence—nebulous concepts often referred to as “artificial general intelligence” (AGI) and “artificial superintelligence” (ASI)—this lack of reliability may be a showstopper.
“There is room for progress, but intrinsically this [LLM] approach will hit a dead end sooner or later,” said Pieter den Hamer, who leads the generative AI resource center at analyst firm Gartner.
Combining ‘System 1’ and ‘System 2’ thinking
Enter neurosymbolic AI, an increasingly popular idea that—as its name suggests—brings together the neural and symbolic approaches with the aim of getting the best out of each, in a complementary way.
Many in the field draw a comparison with psychologist Daniel Kahneman’s thesis that there are two kinds of thinking: System 1 thinking, which is fast and instinctive and used in perception; and System 2 thinking, which is the slower and more conscious thinking we do when we consider things and make decisions. Neural networks such as LLMs are very good at the first kind, but it may be that symbolic AI is needed for System 2–like thinking.
“Neurosymbolic AI seems to be one of the necessary steps to achieve AGI at some point in the future, because we need this better reasoning and more reliable intelligence than we have today,” said den Hamer.
Some pioneers of deep learning are openly scornful of the neurosymbolic approach. Meta chief AI scientist Yann LeCun said in June that it was “incompatible with deep learning,” and Geoffrey Hinton—who quit Google last year over his fears about AI’s negative impacts—insists that deep learning alone will “be able to do everything.”
But an awful lot of very credible people and companies think there’s something to it.
The seeds of neurosymbolic AI
IBM, which says it sees neurosymbolic AI as “a pathway to achieve artificial general intelligence,” has been experimenting with its use in providing reasoned answers to queries about images. When Google showed earlier this year that its new AlphaProof and AlphaGeometry 2 systems could perform at silver-medalist level when presented with International Mathematical Olympiad problems, those were demonstrations of neural language models working hand in hand with symbolic deduction engines.
The neural networks involved in neurosymbolic AI don’t have to be ChatGPT-like LLMs, but some of the most interesting neurosymbolic applications use LLMs as a component, in ways designed to overcome their limitations.
“LLMs are not designed to perform formal computation—that is, deterministically, efficiently, precisely, consistently, and reliably following a set of rules or mathematical formula. That is what classic algorithmic programming is for,” said David Ferrucci, the computer scientist who led the team behind IBM’s Jeopardy!-winning Watson AI system, and who went on to found a neurosymbolic AI company called Elemental Cognition. (Ferrucci has now left Elemental Cognition and has become managing director of the Institute for Advanced Enterprise AI, which was launched earlier this month.)
The neurosymbolic approach holds particular promise for AI applications in heavily regulated industries and those where “for ethical reasons people don’t want to rely on AI that has poor transparency or is not very reliable,” said den Hamer.
“Neurosymbolic systems can reach 100% precision, which means being able to fully trust the answers [they give] as accurate, and be fully auditable with every influence on the answer being visible to the customer,” said William Tunstall-Pedoe, a key creator of what became Amazon’s Alexa virtual assistant. He is now the founder and CEO of neurosymbolic AI firm Unlikely AI, which raised $20 million last year but has yet to reveal its product.
“For many applications, such as health care or finance, this can make the difference for adopting the technology at all,” he said.
There are many flavors of neurosymbolic AI, but those being deployed today mostly involve having the neural and symbolic components work alongside each other and interact when they need to, rather than being tightly integrated into a new kind of model—a more theoretical vision for now.
Some even argue that, as soon as you give an LLM access to a tool like a code interpreter or a search engine, that qualifies as neurosymbolic AI. “An LLM on its own is not going to be the most reliable calculator ever,” said Jaime Sevilla, director of the Epoch AI research institute. “But nevertheless it can do what humans do—like, just use a calculator.”
But far more complex implementations are already emerging.
Reasoning engines and new models
Elemental Cognition has created a reasoning engine that uses LLMs to handle the natural-language queries of the user, while relying on a separate problem-solving component (a so-called dynamic constraint resolution algorithm) to reliably do what the user wants.
This is useful for tasks such as optimization and logistics; the global airline alliance Oneworld, whose members include British Airways and American Airlines, uses Elemental Cognition’s system to power its new AI travel agent.
Meanwhile, serial entrepreneur Wayne Chang, who cofounded the mobile crash-reporting service Crashlytics among many other ventures, just took the wraps off a startup called Reasoner that also offers a neurosymbolic reasoning engine that is already powering Chang’s Patented.ai service for intellectual property lawyers.
The Reasoner engine marries LLMs with knowledge graphs—organized representations of real-world objects and concepts and the relationships between them—which allows it to read documents and let users ask detailed questions about them, with more accurate results than pure LLM technology would allow. Chang claims accuracy approaching 100%, although this has yet to be independently verified.
What’s more, Reasoner provides a very legible run-through of how it answers questions, constantly referring back to the source material and explaining each step of its deductive process. “No enterprise [AI] app will succeed if you do not have a neurosymbolic engine running, because no customer will be able to trust it,” said Chang.
SAP has also been using neurosymbolic AI in various ways. As chief AI officer Philipp Herzig explained to Fortune, these are so far mostly of the loosely coupled variety, though SAP is also working on a more futuristic, tightly integrated neurosymbolic system.
The German enterprise systems company wanted to train an LLM on its ABAP programming language, but traditional fine-tuning only got it to around 80% accuracy. So it introduced a formal parser to the mix, to check each token for legitimacy and reject it if it doesn’t work, demanding another one. That got the accuracy of the LLM’s coding ability up to 99.8%, Herzig said. SAP is already using the model internally and intends to release it as a product early next year.
The company also found that LLMs can’t gain an inherent understanding of its metadata model, so when the user asks for details from the information stored in an SAP system, “GPT comes up with very beautiful names, but 75% is made up,” Herzig said. The solution was to put all that metadata into a huge knowledge graph and give the LLM access. “All of a sudden, you basically stop the hallucinations.”
Herzig said SAP was also working on its own foundation model, based on structured data, with the aim of predicting things in a structured way. The “large graph model” approach involves combining knowledge graphs with the transformer architecture that underpins LLMs, during the learning phase. “This more integrated learning approach is something that also looks very promising,” he said.
Potential upsides
Of course, companies wedded to the pure-LLM approach are also trying to overcome the limitations of their chosen technology. Part of this involves throwing ever more data at the LLMs during their training, though it has recently become apparent that this approach may be seeing diminishing returns.
OpenAI’s new o1 models also attempt reasoning at the question-answering (“inference”) stage, following a sequence of steps to work out answers that are most likely to be correct. This appears to involve repeated inference, as well as a search process across multiple possible steps that the LLM generates, making the process slow and relatively expensive—and necessarily using more energy than straightforward inference does.
Some argue that the neurosymbolic approach will be far more energy-efficient—a big deal in an AI industry that is already straining power grids.
“There are substantial environmental benefits to neurosymbolic AI that come from shifting a large percentage of the processing from very large deep learning models running on GPUs to much more efficient symbolic processing typically on CPUs,” said Tunstall-Pedoe. “This portion of the processing is many orders of magnitude more energy efficient. With models like OpenAI’s o1 doing far more processing than its predecessors to produce results, there is also a continued trend of LLMs increasing their GPU usage rather than becoming more efficient.”
The vast computational requirements of cutting-edge LLMs also make them prohibitively expensive to train, for all but a handful of companies. According to Tunstall-Pedoe, this makes the neurosymbolic approach a more realistic one for many companies and especially for startups.
“Training large language models involves spending a vast amount of money purely on GPU time during model training. There may also be substantial costs borne by the startup when their models are used,” he said. “When developing a neurosymbolic model, this can be much reduced, and as a result, there is less need for a very large round, and money can be invested into neurosymbolic research and novel techniques, as well as invested into the products that utilize the neurosymbolic platform.”
Big hurdles remain
None of this is to say that the way forward for neurosymbolic AI is clear.
For one thing, LLM makers like OpenAI have every incentive to overcome the limitations of their chosen approach, and it is yet to be proven that they cannot do so. Some researchers and companies are also working on alternative ways of moving to more reliable AI. And—at least for now—the neurosymbolic approach comes with its own disadvantages.
Neurosymbolic AI systems are relatively slow compared with the likes of ChatGPT, and less adept at dealing with ambiguous language and implicit knowledge. They are also harder to scale because of the labor involved in establishing and maintaining the rules and relationships on the symbolic side.
“You need to show your system can scale—this is the promise of deep learning and LLMs,” said Sevilla. “Whichever neurosymbolic method you come up with, you need to have this convincing story of how, if you give it access to more computational resources, it will get better and better over time.”
Some argue that neurosymbolic AI will need new hardware to reach its full potential.
Zishen Wan, a PhD student at Georgia Tech who recently coauthored a paper on the need for new architectures and is about to release another paper proposing one, said neurosymbolic AI “often runs kind of slow” on today’s AI chips, which are much better suited to neural networks. He explained that this was because neurosymbolic AI uses more diverse compute kernels, and also because it is less efficient than neural networks at reusing data—meaning it needs to move data around a lot more.
This may now be a chicken-and-egg problem. “We’re seeing many promising applications, but is that great enough for [the industry] to create completely new neurosymbolic hardware?” Wan asked. “That’s a bit hard to say.”
Correction: This article was updated on Dec. 18 to note Ferrucci's new affiliation.