Artificial intelligence (AI) refers to any technology exhibiting some facets of human intelligence, and it has been a prominent field in computer science for decades. AI tasks can include anything from picking out objects in a visual scene to knowing how to frame a sentence, or even predicting stock price movements.
Scientists have been trying to build AI since the dawn of the computing era. The leading approach for much of the last century involved creating large databases of facts and rules and then getting logic-based computer programs to draw on these to make decisions. But this century has seen a shift, with new approaches that get computers to learn their own facts and rules by analyzing data. This has led to major advances in the field.
Over the past decade, machines have exhibited seemingly "superhuman" capabilities in everything from spotting breast cancer in medical images, to playing the devilishly tricky board games Chess and Go — and even predicting the structure of proteins.
Since the large language model (LLM) chatbot ChatGPT burst onto the scene late in 2022, there has also been a growing consensus that we could be on the cusp of replicating more general intelligence similar to that seen in humans — known as artificial general intelligence (AGI). "It really cannot be overemphasized how pivotal a shift this has been for the field," said Sara Hooker, head of Cohere For AI, a non-profit research lab created by the AI company Cohere.
How does AI work?
While scientists can take many approaches to building AI systems, machine learning is the most widely used today. This involves getting a computer to analyze data to identify patterns that can then be used to make predictions.
The learning process is governed by an algorithm — a sequence of instructions written by humans that tells the computer how to analyze data — and the output of this process is a statistical model encoding all the discovered patterns. This can then be fed with new data to generate predictions.
Many kinds of machine learning algorithms exist, but neural networks are among the most widely used today. These are collections of machine learning algorithms loosely modeled on the human brain, and they learn by adjusting the strength of the connections between the network of "artificial neurons" as they trawl through their training data. This is the architecture that many of the most popular AI services today, like text and image generators, use.
Most cutting-edge research today involves deep learning, which refers to using very large neural networks with many layers of artificial neurons. The idea has been around since the 1980s — but the massive data and computational requirements limited applications. Then in 2012, researchers discovered that specialized computer chips known as graphics processing units (GPUs) speed up deep learning. Deep learning has since been the gold standard in research.
"Deep neural networks are kind of machine learning on steroids," Hooker said. "They're both the most computationally expensive models, but also typically big, powerful, and expressive"
Not all neural networks are the same, however. Different configurations, or "architectures" as they're known, are suited to different tasks. Convolutional neural networks have patterns of connectivity inspired by the animal visual cortex and excel at visual tasks. Recurrent neural networks, which feature a form of internal memory, specialize in processing sequential data.
The algorithms can also be trained differently depending on the application. The most common approach is called "supervised learning," and involves humans assigning labels to each piece of data to guide the pattern-learning process. For example, you would add the label "cat" to images of cats.
In "unsupervised learning," the training data is unlabelled and the machine must work things out for itself. This requires a lot more data and can be hard to get working — but because the learning process isn't constrained by human preconceptions, it can lead to richer and more powerful models. Many of the recent breakthroughs in LLMs have used this approach.
The last major training approach is "reinforcement learning," which lets an AI learn by trial and error. This is most commonly used to train game-playing AI systems or robots — including humanoid robots like Figure 01, or these soccer-playing miniature robots — and involves repeatedly attempting a task and updating a set of internal rules in response to positive or negative feedback. This approach powered Google Deepmind's ground-breaking AlphaGo model.
What is generative AI?
Despite deep learning scoring a string of major successes over the past decade, few have caught the public imagination in the same way as ChatGPT's uncannily human conversational capabilities. This is one of several generative AI systems that use deep learning and neural networks to generate an output based on a user's input — including text, images, audio and even video.
Text generators like ChatGPT operate using a subset of AI known as "natural language processing" (NLP). The genesis of this breakthrough can be traced to a novel deep learning architecture introduced by Google scientists in 2017 called the "transformer."
Transformer algorithms specialize in performing unsupervised learning on massive collections of sequential data — in particular, big chunks of written text. They're good at doing this because they can track relationships between distant data points much better than previous approaches, which allows them to better understand the context of what they're looking at.
"What I say next hinges on what I said before — our language is connected in time," said Hooker. "That was one of the pivotal breakthroughs, this ability to actually see the words as a whole."
LLMs learn by masking the next word in a sentence before trying to guess what it is based on what came before. The training data already contains the answer so the approach doesn't require any human labeling, making it possible to simply scrape reams of data from the internet and feed it into the algorithm. Transformers can also carry out multiple instances of this training game in parallel, which allows them to churn through data much faster.
By training on such vast amounts of data, transformers can produce extremely sophisticated models of human language — hence the "large language model" moniker. They can also analyze and generate complex, long-form text very similar to the text that a human can generate. It's not just language that transformers have revolutionized. The same architecture can also be trained on text and image data in parallel, resulting in models like Stable Diffusion and DALL-E, that produce high-definition images from a simple written description.
Transformers also played a central role in Google Deepmind's AlphaFold 2 model, which can generate protein structures from sequences of amino acids. This ability to produce original data, rather than simply analyzing existing data is why these models are known as "generative AI."
Narrow AI vs artificial general intelligence (AGI): What's the difference?
People have grown excited about LLMs due to the breadth of tasks they can perform. Most machine learning systems are trained to solve a particular problem — such as detecting faces in a video feed or translating from one language to another. These models are known as “narrow AI” because they can only tackle the specific task they were trained for.
Most machine learning systems are trained to solve a particular problem —, such as detecting faces in a video feed or translating from one language to another —, to a superhuman level, in that they are much faster and perform better than a human could. But LLMs like ChatGPT represent a step-change in AI capabilities because a single model can carry out a wide range of tasks. They can answer questions about diverse topics, summarize documents, translate between languages and write code.
This ability to generalize what they've learned to solve many different problems has led some to speculate LLMs could be a step toward AGI, including DeepMind scientists in a paper published last year. AGI refers to a hypothetical future AI capable of mastering any cognitive task a human can, reasoning abstractly about problems, and adapting to new situations without specific training.
AI enthusiasts predict once AGI is achieved, technological progress will accelerate rapidly — an inflection point known as "the singularity" after which breakthroughs will be realized exponentially. There are also perceived existential risks, ranging from massive economic and labor market disruption to the potential for AI to discover new pathogens or weapons.
But there is still debate as to whether LLMs will be a precursor to an AGI, or simply one architecture in a broader network or ecosystem of AI architectures that is needed for AGI. Some say LLMs are miles away from replicating human reasoning and cognitive capabilities. According to detractors, these models have simply memorized vast amounts of information, which they recombine in ways that give the false impression of deeper understanding; it means they are limited by training data and are not fundamentally different from other narrow AI tools.
Nonetheless, it's certain LLMs represent a seismic shift in how scientists approach AI development, said Hooker. Rather than training models on specific tasks, cutting-edge research now takes these pre-trained, generally capable models and adapts them to specific use cases. This has led to them being referred to as "foundation models."
"People are moving from very specialized models that only do one thing to a foundation model, which does everything," Hooker added. "They're the models on which everything is built."
How is AI used in the real world?
Technologies like machine learning are everywhere. AI-powered recommendation algorithms decide what you watch on Netflix or YouTube — while translation models make it possible to instantly convert a web page from a foreign language to your own. Your bank probably also uses AI models to detect any unusual activity on your account that might suggest fraud, and surveillance cameras and self-driving cars use computer vision models to identify people and objects from video feeds.
But generative AI tools and services are starting to creep into the real world beyond novelty chatbots like ChatGPT. Most major AI developers now have a chatbot that can answer users' questions on various topics, analyze and summarize documents, and translate between languages. These models are also being integrated into search engines — like Gemini into Google Search — and companies are also building AI-powered digital assistants that help programmers write code, like Github Copilot. They can even be a productivity-boosting tool for people who use word processors or email clients.
Chatbot-style AI tools are the most commonly found generative AI service, but despite their impressive performance, LLMs are still far from perfect. They make statistical guesses about what words should follow a particular prompt. Although they often produce results that indicate understanding, they can also confidently generate plausible but wrong answers — known as "hallucinations."
While generative AI is becoming increasingly common, it's far from clear where or how these tools will prove most useful. And given how new the technology is, there's reason to be cautious about how quickly it is rolled out, Hooker said. "It's very unusual for something to be at the frontier of technical possibility, but at the same time, deployed widely," she added. "That brings its own risks and challenges."