Elon Musk confirms that Grok 3 is coming soon

Elon Musk confirms that Grok 3 is coming soon — pretraining took 10X more compute power than Grok 2 on 100,000 Nvidia H100 GPUs

Four banks of xAI's HGX H100 server racks, holding eight servers each. .

Elon Musk has announced that xAI's Grok 3 large language model (LLM) has been pretrained, and took 10X more compute power than Grok 2. He did not reveal many details, but based on timing, the Grok 3 LLM was pre-trained on the Colossus supercluster, which contains some 100,000 Nvidia H100 GPUs.

"Grok 3 is coming soon," Elon Musk wrote in an X post. "Pretraining is now complete with 10X more compute than Grok 2."

Given the timing and context, this confirms previous reports that xAI's Colossus supercomputer, which boasts around 100,000 Nvidia H100 GPUs, was specifically built to accelerate large-scale AI projects. The mention of tenfold 'more compute than Grok 2' further supports the idea that Grok 3's pretraining leveraged this immense computational infrastructure. For obvious reasons, Grok 3 used data generated by users of X.

Specific details about the computational infrastructure used to train Grok 2 have not been widely disclosed, but we can figure out that it used a considerably less powerful cluster than Grok 3. Still, Grok 2 was pretrained on powerful, though not yet groundbreaking, computational resources.

Companies like xAI need systems like Colossus to keep up with competitors like OpenAI, Google DeepMind, and Anthropic. The ability to pretrain faster and at greater scale allows for quicker deployment of cutting-edge models, such as LLMs like Grok 3 or GPT-4 that contain hundreds of billions of parameters. Training these models involves trillions of floating-point operations. This is why Colossus will be expanded to 200,000 H100 and H200 GPUs in the coming months so that Grok Next will be pre-trained on an even more colossal system.

It is noteworthy that xAI plans to deploy a supercomputer powered by over a million GPUs over time. That version of Colossus will be used to train LLMs that will likely contain trillions of parameters and will be far more accurate than Grok 3 or GPT-4o. However, in addition to a greater number of parameters, newer models may feature more advanced reasoning, which brings them closer to artificial general intelligence, which is the ultimate goal for companies like xAI and OpenAI.

Read news from 100’s of titles, curated specifically for you.

Already a member? Sign in here