Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Fortune
Fortune
Jeremy Kahn

Is Meta's new Llama AI model a game changer?

Meta CEO Mark Zuckerberg (Credit: Jason Henry—Bloomberg via Getty Images)

Hello and welcome to Eye on AI.

People are buzzing about today’s release of Meta’s new Llama 3.1 model. What’s notable is that this is Meta’s largest Llama model to date, with 405 billion parameters. (Parameters are the adjustable variables in a neural network, and give a rough sense of how large an AI model is.) And according to benchmark performance figures that conveniently leaked onto Reddit the day ahead of the official release, Llama 3.1 exceeds the capabilities of OpenAI’s latest and greatest model, GPT-4o, by a few percentage points across a number of measures, including some benchmarks designed to test reasoning.

Not only that, but Llama 3.1 is, like the other Llama models Meta has released, an “open model,” meaning anyone can potentially build their own applications on top of it without paying, and even modify the model in any way they desire. But the models Meta has released before have been smaller, and less capable than any of the proprietary models, such as OpenAI's GPT-4, Anthropic's Claude 3 Opus, or Gemini's Ultra or 1.5 Pro models. The fact that Meta's new Llama 3.1 may have now closed the gap to GPT-4o has a lot of people excited that Llama 3.1 405B will be the model that finally enables many businesses to really unlock the return on investment from generative AI.

Anton McGonnell, head of software products at SambaNova Systems, which builds AI hardware and software for big companies, said in a statement that Llama 3.1 405B might be a game changer because it will allow two things: one is that companies can use the 405B parameter model to create synthetic datasets that can be used to train or fine-tune small open models to hone them for specific applications. This “distillation” process has been possible before but there were often ethical concerns about how the data used for “distillation” had been sourced (with data being scraped from the web without consent, or derived from the use of poorly paid human contractors).

McGonnell also applauded Meta’s decision to release Llama 3.1 405B as part of a family of Llama models of different sizes (there are also upgraded 70 billion- and 8 billion-parameter models) and to release a “Llama stack.” This is a set of related software built on top of and around the AI models themselves. Meta's AI stack includes guardrails software, to prevent the AI models from generating harmful or dangerous content, and security software to try to prevent prompt injection attacks against the Llama models. The family of models and the AI stack, McGonnell said, create the possibility of chaining open models together in a way that would be especially cost-effective—using a process in which parts of a user’s query or an application are handled by small, fine-tuned models, and only those more difficult aspects that these models can’t handle are handed off to the full-scale 405 billion parameter model.

But McGonnell’s enthusiasm aside, there’s a catch—actually a bunch of them. The model is so big that it can’t easily be hosted on a single GPU or even a dozen of them. (Meta’s 70 billion parameter version of Llama 3 can potentially be run on two high-end Nvidia GPUs.) That means companies might have to pay for a lot of their own very expensive GPUs in the cloud to run the model and they will need a lot of rare technical expertise in how to split an AI workload across those GPUs and then bring the results back together to produce an output. To overcome those two issues, Meta is partnering with a bunch of companies, such as the AI services and data analytics company Databricks and the cloud service providers AWS, Microsoft Azure, Google Cloud, Nvidia Foundry, and others to host the model and offer tools and services around it. It has also partnered with Groq, a hardware company that builds an alternative computer chip to Nvidia’s GPUs that is designed specifically for running AI workloads on trained models, to help try to lower the cost of running such a large model and also speed up the time it takes the model to generate an output.

Such an arrangement starts to make access to Llama 3.1 405B look a lot more like accessing a proprietary model through an application programming interface (API), which is what OpenAI, Anthropic, and Google Gemini offer (Google also offers some open models, called Gemma). It’s not clear yet how the costs of hosting and accessing your own Llama 3.1 model through one of Meta’s partners will compare to simply building on top of OpenAI’s GPT-4o or Claude Opus. Previously, some developers have reportedly complained that hosting their own version of Llama 3’s 70 billion parameter model was sometimes more expensive than simply paying OpenAI on a per-token basis to access the more capable GPT-4 model.

It also isn’t clear yet how much developers will be able to tinker with the parameters of the Llama 3.1 model they are running on the servers of one of Meta’s partners, which presumably may be using the same model to run inference for several customers in order to maximize the return on their own hardware investment needed to host such a big model. If these partners limit how much developers can adjust the model’s weights, that may negate some of the advantages of using the open model. It also isn’t clear yet exactly what commercial licensing restrictions Meta has placed on the use of Llama 3.1 405B.

In the past, the restrictions Meta has placed around the licensing of its Llama models have led open-source software purists to complain that Meta has twisted the meaning of open-source beyond recognition and that these models should not be called “open-source software” at all. Hence the growing use of the term “open model” as opposed to “open-source model.”

As with all open models, there are also some real concerns about AI safety here. Llama has not revealed the results of any red-teaming or safety testing it has done of its own model. More capable models are generally more dangerous—a bad actor could more easily use them to suggest recipes for bioweapons or chemical weapons, to develop malicious software code, or to run highly automated disinformation campaigns, phishing schemes, or frauds. And as with all open models, it is easy for a sophisticated AI developer to remove any guardrails Meta has engineered into the baseline model.

Finally, as capable as Llama 3.1 405B may be, it will likely be superseded soon by even more capable proprietary models. Google is working on Project Astra, an AI model that will be more “agentic”—able to take actions, not just generate text or images. At Fortune’s Brainstorm Tech conference last week, Google’s chief research scientist Jeff Dean told me that Google will likely begin rolling this model out to some test users as soon as the fall. OpenAI is known to be training GPT-5, which will certainly be more capable than GPT-4o and may also have agentic properties. Anthropic is no doubt training a model that goes beyond Claude 3 Opus, its most powerful model, and also working on an AI agent.

All of this just underscores how competitive the AI “foundation model”—models on which many different kinds of AI applications can be built—has become and how difficult it will be for any AI startups working on such models to survive as independent entities. That may not bode well for investors in hot French AI startups Mistral and H, or other independent foundation model companies like Cohere, or even somewhat more specialized AI model companies such as Character AI and Essential AI. It may be that only the biggest tech players, or those closely associated with them, will be able to keep pushing the boundaries of what these models can do.

The good news for the rest of us is that, despite the caveats I’ve listed above, this foundation model race is actually driving down the cost of implementing AI models. While overall AI spending is continuing to climb as companies begin to deploy AI models more widely across their organizations, on a per-output basis, “the cost of intelligence” is falling dramatically. This should mean more companies will begin to see a return on investment from generative AI, accelerating the dawn of this new AI era.

With that, here’s more AI news.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

Before we get to the news... If you want to learn more about AI and its likely impacts on our companies, our jobs, our society, and even our own personal lives, please consider picking up a copy of my new book, Mastering AI: A Survival Guide to Our Superpowered Future. It's out now in the U.S. from Simon & Schuster and you can order a copy today here. If you live in the U.K., the book will be published by Bedford Square Publishers next week and you can preorder a copy today here.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.