Elon Musk's AI company, xAI, is set to expand its Colossus supercomputer to over one million GPUs, reports the Financial Times. Thus, the expanded Colossus machine will be one of the most powerful supercomputers in the world. However, it will require significant investments, supply, and infrastructure availability.
Colossus, which is used to train the large language model behind Grok, already operates over 100,000 H100 processors from Nvidia and is set to double the number of GPUs shortly to become the largest supercomputer in a single building. The plan to increase the number of GPUs is underway, though this one is going to take a sizeable amount of time and effort. To accomplish the mission, xAI is working with Nvidia, Dell, and Supermicro. Furthermore, Memphis, Tennessee, where Colossus is located, has reportedly established a dedicated xAI operations team to aid the endeavor.
It is unclear whether xAI plans to use current-generation Hopper or next-generation Blackwell GPUs during the expansion. The Blackwell platform is expected to scale better than Hopper, so it makes more sense to use the upcoming technology instead of the current one. But in any case, getting the 800,000 – 900,000 AI GPUs is hard, as demand for Nvidia's products is overwhelming. Another challenge is to make 1,000,000 GPUs work in concert with maximum efficiency and, again, Blackwell would make more sense here.
The financial requirements of this expansion are colossal, of course. Acquiring GPUs — costing tens of thousands of dollars each — alongside infrastructure for power and cooling, could push investment into the tens of billions. xAI has raised $11 billion this year and recently secured another $5 billion. Currently, the company is valued at $45 billion.
Unlike rivals such as OpenAI, which partners with Microsoft for computing power, and Anthropic, supported by Amazon, xAI is independently building its supercomputing capacity. This strategy puts the company in a high-stakes race to secure advanced AI hardware, but given the scale of xAI's investments, this actually puts Musk's company ahead of its rivals.
Despite its rapid progress, xAI has faced criticism for allegedly bypassing planning permissions and the project's strain on the regional power grid. To address concerns the company has emphasized grid stability measures, including deploying Tesla's megapack technology to manage power demands.
While xAI's focus on hardware has earned acclaim, its commercial offerings remain limited. Grok reportedly lags behind leading models like ChatGPT and Google's Gemini in both sophistication and user base. However, investors view Colossus as a foundational achievement that demonstrates xAI's ability to rapidly deploy cutting-edge technology.