Billionaire Elon Musk has taken to Twitter / X to boast that his remarkable xAI data center is set to double its firepower “soon.” He was commenting on the recent video exposé of his xAI Colossus AI supercomputer. In the highlighted video, TechTuber ServeTheHome was stunned when he saw the gleaming rows of Supermicro servers packed with 100,000 state-of-the-art Nvidia enterprise GPUs.
So, the xAI Colossus AI supercomputer is on course “Soon to become a 200k H100/H200 training cluster in a single building.” Its 100,000 GPU incarnation, which only just started AI training about two weeks ago, was already notable. While we think “soon” might indeed be soon in this case. However, Musk’s prior tech timing slippages (e.g., Tesla's full self-driving, Hyperloop delays, SolarCity struggles) mean we should be generally cautious about his forward-looking boasts.
The xAI Colossus has already been dubbed an engineering marvel. Importantly, praise for the supercomputer’s prowess isn’t limited to the usual Musk toadies. Nvidia CEO Jensen Huang also described this supercomputer project as a “superhuman” feat that had “never been done before.” xAI engineers must have worked very hard and long hours to set up the xAI Colossus AI supercomputer in 19 days. Typically, projects of this scale and complexity can take up to four years to get running, indicated Huang.
Soon to become a 200k H100/H200 training cluster in a single building https://t.co/2YvdmqXp1WOctober 28, 2024
What will the 200,000 H100/H200 GPUs be used for? This very considerable computing resource will probably not be tasked with making scientific breakthroughs for the benefit of mankind. Instead, the 200,000 power-hungry GPUs are likely destined to train AI models and chatbots like Grok 3, ramping up the potency of its machine learning distilled ‘anti-woke’ retorts.
This isn’t the hardware endgame for xAI Collosus hardware expansion, far from it. Musk previously touted a Colossus packing 300,000 Nvidia H200 GPUs throbbing within.
At the current pace of upgrades, we could even see Musk Tweeting about reaching this 300,000 goal before 2024 is out. Perhaps, if anything delays ‘Grok 300,000,’ it could be factors outside of Musk’s control, like GPU supplies. We have also previously reported that on-site power generation had to be beefed up to cope even with stage 1 of xAI's Colossus, so that’s another hurdle - alongside complex liquid cooling and networking hardware.