The brave new world of artificial intelligence (AI) is upon us, and we’re now all grappling with what this will mean for us personally, for our society and the world. Make no mistake, huge changes are coming, but in among the inevitable upheaval there’s also a lot of hype and nonsense.
As usual, we at Tom’s Guide are here to help you carve some sense out of the madness. Our job is to dive into the facts, and make sure our readers obtain an informed, balanced and intelligent overview of what AI is and what it’s not.
As part of this role, I’m going to help explain the core elements of the AI ecosystem in plain English. Hopefully this guide will cut through the jargon and give you a clearer idea of what pieces are important, and which are merely window dressing. By the end, you should be able to hold your own at a party when someone starts ranting about AI while clearly not understanding the basics.
So let’s take a look at some of the key terminology you’ll need to impress the room.
The basics of AI
AI — artificial intelligence
So what is artificial intelligence? At its heart AI is any system which attempts to mimic human intelligence by manipulating data in a similar way to our brains. The earliest forms of AI were relatively crude, like expert systems and machine vision. Nowadays the explosion in computing power has created a new generation of AI which is extremely powerful.
AGI — Artificial General Intelligence
AGI is the next level of powerful AI. While current AI is still mostly restricted in what it can do, AGI promises to break free from those limitations and deliver the ability to ‘problem solve’ or ‘reason’ as we humans do.
NB: There is much debate over whether this will result in ‘consciousness’ or ‘emotions’. A conversation made even more difficult by the fact that we still don’t have clear definitions of those terms ourselves.
No-one really has a clear cut definition for AGI, but one version goes something like this:
AGI I: advanced computer level ability (GPT-4 quality?)
AGI II: advanced human level competence (GPT-5 quality?)
AGI III: extreme AI level competence (perhaps GPT-6?)
ASI: artificial superintelligence — completely superior to human abilities.
ASI — Artificial Special Intelligence
Artificial Superintelligence (ASI) is a form of AI that doesn't exist yet and is often confused with AGI. In a theoretical future, ASI will be a system with intelligence and capabilities that far exceed humans in all areas.
Neural Networks
A neural network is the computational structure, inspired by the function of the human brain, which does the data number crunching to create the models used in AI. These computer networks calculate mathematical routines at unimaginably fast speeds, using massive processor arrays. Probably best to leave it at that.
Machine Learning (see also deep learning)
Computers read data, identify patterns and make connections between data points, all without explicit programming. The resultant models power AI’s ability to interact with the world. Deep learning algorithms are at the heart of every AI model.
NB: The scary part is when the model uses this data and learns how to do completely unexpected new stuff without being trained or guided. These are called emergent skills, and are what keeps AI safety teams up at night.
Natural Language Processing
This is the software technology which lets models understand, interpret, and generate human language. These processes are part of what gives AI its ‘human’ feel while interacting with the user.
Ideology
Commercial vs Open Source
In the blue corner, open source based technology. Currently led by megacorp Meta with its LlaMA model, this is the current best hope for an open, non—commercial future for AI in general. Similarly, Stability AI has driven the revolution in open source AI image and art generation with its StableDiffusion models.
We've also got newcomer Black Forest Labs with its powerful Flux.01 family of models that has already swept the web and been integrated into Grok on X.
In the red corner, everyone else. Commercial, for-profit AI companies like OpenAI, Anthropic, Google, Microsoft and others.
And these are just the foundation models. Once we move down to the applications, there are a huge number of entrants on both sides of the fence. Entrepreneurs from Rome to Bangalore are now furiously coding the future to produce commercial and open source products which create art, music, financial analysis and so much more.
Models
Foundation Models
Also known as Generative Foundation Models, these are the huge data stores which have been pre-trained with a neural network and vast amounts of data to cope with a wide variety of tasks. These models are then fine-tuned to create smaller, cheaper, easier to use models for different purposes. OpenAI’s GPT is a foundation model, as is Google’s Gemini, Anthropic’s Claude, Meta’s LlaMA and so on.
NB: Many if not all of the biggest Foundation models in use today started out being pre-trained on the free and open Common Crawl Dataset, which is several petabytes of data culled from the internet since 2008. This dataset contains 250 billion pages, with 3 to 5 billion being added every month. All of this data goes through an intricate process of cleaning and checking before being used.
Large Language Models (LLMs)
The most common model reference in AI is that of large language models. This is because these entities have become the centrepiece of the exciting advances in AI we’ve seen in the past few years. A Large Language Model is a package of data and software code which uses training and intense mathematical calculations to recognize connections between words. ChatGPT and Microsoft CoPilot are famous examples. But there are different variations, as we’ll see below.
NB: Models are completely different from databases. Fun fact: LLMs are like black boxes. Even the people who create them have no idea what’s going on deep inside the LLM itself as it runs. Interesting, eh?
Generative Models
Foundation models which have been fine-tuned are typically referred to as Generative AI Models. The two most common types currently are transformers and diffusion.
Transformer Model
The Transformer Model archetype was introduced to the world in a 2017 scientific paper published by a Google AI research team. The new breakthrough technology used features like self-attention and parallel processing to dramatically speed up AI responses. As a result transformers ushered in a new age of fast, flexible AI, specifically in the shape of OpenAI’s GPT family. The GPT stands for Generative Pre-Trained Transformers.
NB: Transformer models, for example in ChatGPT, are very amenable to scaling, enhanced training and fine-tuning. They can also be ‘stacked’ together, which makes them perfect for complex nuanced conversations which involve humor, irony, or cultural references. They’re also great for language translation, text summaries and other data retrieval tasks.
Diffusion Model
Diffusion models were first introduced by a Stanford University team in 2015. The innovative new method creates content by deconstructing noise through a process of diffusion. Convert a dog image into a block of digital noise, then create a new dog image by stripping out the noise until the result resembles a variant of the original dog. The Stable Diffusion and DALL-E AI art generator models are examples of the genre.
NB: The waters have been somewhat muddied recently by the addition of hybrid models like diffusion transformers (DiT), which combine the two technologies into a more efficient, faster operator. Examples include OpenAI’s upcoming text to video tool Sora, and Stable Diffusion 3.
Chatbots
AI-powered conversational agents that can understand and respond to user queries in natural language, providing information, guidance, or entertainment. Chatbots are fine—tuned on Foundation LLMs to exhibit specific communication skills, while also delivering impressive general knowledge performance.
NB: All chatbots are basically not much more than prediction machines. They predict what the next word in a sentence should most likely be, based on context, probability analysis and other clever stuff.
Generatively Pre—Trained Transformer (GPT)
Most people think of ChatGPT when they think of modern AI. This is because it was the first consumer friendly AI to hit the world back in November 2022. It was the first time the world saw the latent power of huge data connected to a super easy chat interface. And it was mesmerizing.
Users could use the tool for coding, homework, business analytics, marketing and so much more. For example, much of the financial world now works with GPT—4 based fine—tuned chatbots. These tools perform intricate financial modeling and analysis, powering the global deployment of trillions of dollars.
Multimodal Models
A modality in AI terms is a type of data. So text, images, video, audio and so on are all modalities. As compute power has grown, so too has the power to capture and store different types of data, including huge bandwidth examples such as video. Those models which can handle different modalities, e.g. vision and or audio, are known as multimodal models.
Large Vision Model (LVM)
Large Vision Models are designed specifically to process visual data like video or images. The line between LVMs and LLMs is blurring as multimodal GPTs arrive on the market, but there are still some specific applications which need the specialist features of a dedicated visual model. Two examples are OpenAIs’ CLIP which can be used for subtitles and captions, and Google’s ViT for visual analysis and classification applications.
Model Architecture Basics
Prompts (and Prompt Engineering)
Prompts are the instructions used to extract the required response from an AI model. They can be text or multimedia based, and how they are crafted will affect the end result. Natural language can be very imprecise, and computers respond better to clear, unambiguous instructions. Which is where ‘prompt engineering’ comes into play. By spending time creating a more precise and accurate prompt, we can improve the end result.
Prompting basics:
- Prompt (the user instructs or asks the AI model for something) – Inference (the AI model calculates it’s response based on the prompt) – Completion (the result is given to the user).
- Context Window is the total amount of text the model can cope with at any time. The larger the context window, the more complete and accurate the AI responses should be. The context window also helps the model keep track of longer conversations, because it can ‘remember’ more words in the chat interface.
- In Context Learning is also important to the prompt process. This involves giving examples in the prompt to help improve the results. For example — “Write me a haiku about chickens, this is an example haiku: Blah blah…”
- This kind of prompt structure is called zero shot, one shot or few shot prompting. Zero-shot provides no example, one-shot gives one and few-shot more than one example. Using one or few-shot techniques in prompting can significantly improve the results of AI models, especially the ones which feature smaller datasets.
- Instruction vs Role Prompting reflects the difference between giving a simple instruction (‘add 2+2’) or first providing the AI with a role to fill (‘you are an expert math teacher’).
- The most comprehensive prompt engineering will include all the above. Bearing in mind the context window and size of model, the prompt could start with giving the AI a role, giving instructions and then adding in some few—shot examples to guide the AI.
NB These techniques will become less important over time as the power and size of AI models increase. However it is likely that conversational prompts which work with multiple chat prompts will continue to deliver optimum results in the same way that refining a web search typically yields the best answer.
Tokens and Tokenization
Tokens are used in both the pre—training and prompt interaction with models. Tokenization breaks the input text down into tokens representing individual words or subwords, so the model can understand the input and process it (aka run inference see above).
Parameters
Parameters are the crucial values used by a model’s neural network to manage the way it processes data. They include weights, bias and other mathematically derived elements which impact the way the model generates its output. Parameters are tunable, and also derived during training. In general parameters determine how the model behaves, in a similar way as the varying amount of each ingredient in a recipe governs how the final dish tastes.
Coherence
Coherence relates to how logical and consistent the text or image outputs of an AI model are on completion. Incoherent results typically result in garbled or nonsense text or in images which make no sense. Coherence can also be adversely affected by a context window which is too small.
Hallucination
Hallucination is often a by-product of incoherence in text inference. It consists of lies or complete nonsense which is output as a result of a prompt. It can mean the model is too small to cope with the request (not enough data) or the context window has restricted a coherent answer, and so the model hallucinates in order to satisfy the request as best it can.
Temperature
A key factor governing the output of a model. It is a parameter which controls the randomness of AI generated output. It can also be known as ‘creativity’ in image generation. It is user adjusted at the time of prompting, and higher temperatures can create weird and wonderful results or complete hallucinations. Conversely lower temperature values (i.e. below 1.0) will produce more focused and expected results.
Fine-tuning
Adapting a pre-trained model to do a specific task or range of tasks using additional data. The advantage of fine-tuning a large model is a reduction in model size and training/use costs. This is because the model no longer has to be a jack of all trades, but instead can be a master of one. All specialist models, for example medical or coding chatbots, have been fine-tuned from a larger model to produce a more streamlined and effective tool for use in its niche.
NB Fine—tuning covers an extremely wide area. All chatbots, specialist models, even models designed to be run on a local computer or phone, have likely been fine-tuned from a Foundational LLM. ChatGPT-4o is a fine tuned version of GPT-4 which has additional conversational and multimodal skills which make it a perfect personal assistant. Or girlfriend!
Training
Typically called pre-training, this is the base training given to a model to make it function as an AI entity. This training can be supervised (as in showing a model labelled images to teach it what a cat is) or self supervised, e.g. giving the model a set of base rules to follow, and then letting it work out its proper functionality on its own. Training is often connected with human evaluation, particularly Reinforcement Learning Human Feedback (RLHF).
RLHF
Using human feedback and rewards to improve the results of an LLM’s operation. This trial and error technique is crucially important where the training is complex, such as trying to teach a model what ‘funny’ is. By injecting human feedback and reinforcing correct ‘guesses’ by the model, it can be trained to recognise ‘funny’ in circumstances which it hasn’t yet encountered or been directly trained on. At the end the model will have a ‘policy’ it can use for all future similar needs.
Quantization
Reducing the precision of model structures to lower memory requirements and improve model speed, while maintaining acceptable output performance. Quantization is typically used with open source models to reduce their size, so they can work on devices with low memory like laptops and phones.
Checkpoint
A snapshot of a model's state at a particular point during training. This allows for future retraining, while also providing access for public use and inference.
Mixture of Experts
Combining multiple specialized models (experts) to improve AI performance. By routing inputs to the most relevant expert, smaller models can operate with large model speed and efficiency.
Benchmarking
Benchmarks are used to measure and compare a model’s performance and utility against other models in a variety of tasks. There are a number of generally accepted standard tests which are used to measure model performance. An example of an LLM leaderboard can be found from OpenLM.
TOPS
TOPS — or Tera Operations per Second — is a measure of performance in computing and is particularly useful when comparing Neural Processing Units (NPU) or AI accelerators that have to perform calculations quickly.
It is an indication of the number of trillion operations a processor can handle in a single second. This is crucial for tasks like image recognition, generation and other large language model-related applications. The higher the value, the better it will perform at those tasks — getting you that text or image quicker.
Intel and others suggest 40 TOPS is the minimum number required for running something like Microsoft Copilot locally on a laptop without performance degradation and most systems with an NPU are at or near that point.
Safety
Super—alignment
There is significant concern about the risks involved in driving towards an AI system which could turn out to be far more intelligent than its human creators. The question is how AI developers can ‘align’ any future AI so it will always operate in line with human ethical values, and not turn rogue. It may sound like science fiction, but it's probably better to think about it now rather than later. Especially if we're heading towards ASI (see above).
The jury is out as to whether we are doing enough, fast enough to prevent a possible disaster. Where for example, a rogue super AI decides that warring humans are a threat to its existence, and decides to take steps to control human actions in some way or another.
Deepfakes
In the short term the remarkable capacity for AI models to create almost anything humans can think of, has raised the risk of ‘deepfakes’. This is fake multimedia content such as video, audio or images, which reflect a false reality. For example a fake video of a politician saying something outrageous, or a celebrity doing something appalling. Tools are being deployed to detect such fake activity, but it seems to be turning into an arms race, in the same way we fight spam.
Jailbreaking
Jailbreaking is the practice of circumventing the filters and safeguards that are shipped with most modern AI models, which are there to prevent abuse. Abuse includes creation of hate content, depravity and other proscribed social material. The jailbreaking techniques include prompt floods, where the model’s context window is deliberately overloaded with prompts to break down any barriers to delivering unsafe results. Every current model is vulnerable to jailbreaking techniques of different types. Uncensored models – typically open source – generally do not have any safeguards in place.
Frontier AI
These are highly advanced foundation models that could conceivably pose severe risks to public safety. These risks could include creating cybersecurity threats, or destabilizing society in one way or another. A large amount of work is being done to consider precautions, and collaboration between global AI developers and government and law enforcement to mitigate the risks of something going awry.
Miscellaneous
Singularity
The singularity — or technological singularity — is a hypothetical point in the far future where technological progress reaches a point where it exceeds human capacity to manage or control events. At this point, often portrayed as a dystopian climax in science fiction, humanity becomes subservient to its computers, AI and the mechanical universe.
AI Bias
Bias is the extent to which a model has been trained with a dataset which exhibits bias towards one or many aspects of a particular worldview. This can include cultural bias, prejudice towards minority groups and other aspects which could result in distorted or abusive results. Features like racial bias in AI face recognition systems and medical systems, has already resulted in skewed performance which directly or indirectly causes harm to sections of society.
Knowledge Cut Off
Every model is trained up to a certain point in time before being released to the public. The knowledge cut off is the latest date of the information available to the model. So for example, if a model has a knowledge cut off of 31st December 2023, then no data after that date has been included in its pre—training or training data sets. Therefore an event which happens in January 2024 will not be available to users of the AI model until the date is extended with further training, or with the addition of live internet access.
Reasoning
Reasoning, self awareness and emergent skills are widely accepted to be the hallmarks of advanced AGI systems. This is because at this stage the algorithm is making human—like deductions and inferences which it has not been specifically trained on. It is ‘thinking’ like a human. The big question is how to determine if or when simulated reasoning has transitioned over into genuine cognitive activity. Many people believe this will never happen.
Text-to-Speech (TTS) and Speech-to-Text (STT)
Text-to-Speech (TTS) often also known as ‘read aloud technology’ — turns on—screen (or in—system) text content into sound, and audibly reads the result to the user. Similarly speech to text (STT) models will accept and process user audio prompts, convert them to text and process them for action as normal.
The ultimate ‘AI digital assistants’ such as the one featured in the movie Her – and the new ChatGPT4o – make fast powerful use of both TTS and STT as part of their basic function. Imagine having a chat with your computer without needing a keyboard. The technologists clearly believe it’s the future of our interaction with technology in general.
API (Application Programming Interface)
A set of protocols and tools that allow different software applications to communicate and interact with each other. In the AI ecosystem this provides a quick and easy route for large model AI to be embedded in different applications such as web browsers or plugins. This can give users the power to remotely interact with LLMs, even though they may not have computers powerful enough to handle the processing locally.