Say what you will about Elon Musk, but when the technological disrupter sets his mind to something, he plays to win.
Founded only in July of last year, his latest artificial intelligence startup, xAI, just brought a new supercomputer dubbed Colossus online during the Labor Day weekend designed to train its large language model (LLM) known as Grok, a rival to Open AI’s better known GPT-4.
While Grok is limited to paying subscribers of Musk’s X social media platform, many Tesla experts speculate it will eventually form the artificial intelligence powering the EV manufacturer’s humanoid robot Optimus.
Musk estimates this strategic lighthouse project could eventually earn Tesla $1 trillion in profits annually.
Located in Tennessee, the new xAI data center houses 100,000 Nvidia benchmark Hopper H100 processors, more than any other individual AI compute cluster.
“From start to finish, it was done in 122 days,” Musk wrote, calling Colossus “the most powerful AI training system in the world.”
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.
— Elon Musk (@elonmusk) September 2, 2024
Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.
Excellent…
It doesn’t end there for xAI, either: Musk estimates he will double Colossus's compute capacity in a matter of months once he can procure 50,000 chips of Nvidia’s new, more advanced H200 series, which are roughly twice as powerful.
Musk and xAI did not respond to requests from Fortune for comment.
Built to train Grok-3, potentially the next leader in AI models
The pace at which Colossus has been set up is blistering given xAI had selected its Memphis site only in June.
Moreover, several heavy-hitting tech firms, including Microsoft, Google, and Amazon, are competing to acquire Nvidia’s prized Hopper series AI chips in the current AI gold rush alongside Musk.
But the AI entrepreneur is a valued customer of Nvidia and pledged to spend $3 billion to $4 billion this year on CEO Jensen Huang’s hardware—just for Tesla alone.
Moreover, xAI enjoyed a head start by helping itself to Tesla’s supply of AI chips already delivered to the EV manufacturer.
The Memphis cluster will train Musk’s third generation of Grok.
"We're hoping to release Grok-3 by December, and Grok-3 should be the most powerful AI in the world at that point," he told conservative podcaster Jordan Peterson in July.
An early beta of Grok-2 just rolled out to users last month.
It was trained on only around 15,000 Nvidia H100s graphic processors, yet by some standards, it is already among the most capable AI large language models, according to competitive chatbot leaderboards.
Upping that GPU count nearly sevenfold suggests Musk has no intention of surrendering the race to develop artificial general intelligence to OpenAI, which he helped cofound in late 2015 after becoming worried that Google was dominating the technology.
Musk later fell out with CEO Sam Altman and is now suing the company a second time.
To help even the odds, xAI raised $6 billion in a Series B funding round in May, with the help of venture capitalists like Andreessen Horowitz and Sequoia Capital, as well as deep-pocketed investors like Fidelity and Saudi Prince Alwaleed bin Talal’s Kingdom Holding.
Tesla could be the next company to invest in Musk's xAI
Musk has also indicated he would propose to Tesla's board a vote on whether to invest potentially $5 billion into xAI as well, a step welcomed by a number of shareholders.
Of the roughly $10B in AI-related expenditures I said Tesla would make this year, about half is internal, primarily the Tesla-designed AI inference computer and sensors present in all of our cars, plus Dojo.
— Elon Musk (@elonmusk) June 4, 2024
For building the AI training superclusters, NVidia hardware is about…
xAI’s supercomputer cluster has caused alarm in Memphis, however, given the extreme haste with which city officials agreed to the project, which brings economic activity back to a part of the city that last housed an Electrolux factory for white goods.
One main concern is the strain it will create on the city’s resources. Officials of municipal utility MLGW estimate that Colossus requires up to 1 million gallons of water per day to cool the servers and will consume as much as 150 megawatts of power.
But Musk is someone who only thinks big, and anything worth doing is worth doing fast—otherwise you risk falling behind the competition.
Speaking to Lex Fridman after the podcaster visited xAI’s rapidly growing operations, Musk said speed was a key part of his five-step management process.
“Any given thing can be sped up. However fast you think it can be done,” he said last month, “it can be done faster.”