Elon Musk and the team behind xAI have achieved an engineering marvel, setting up a supercluster of 100,000 H200 Blackwell GPUs in a whopping 19 days. Nvidia CEO Jensen Huang told the story of Elon Musk's incredible installation prowess with members of the Tesla Owners Silicon Valley on X.
Huang describes Musk's 19-day escapade with awe and respect, calling the effort "superhuman". The team at xAI purportedly went from the "concept" phase to full-ready compatibility with Nvidia's "gear" in less than three weeks. This includes running xAI's first AI training run on the newly built supercluster as well.
Elon Musk is super human. What would take everyone else a year, only took him 19 days. pic.twitter.com/q51sM48lsuOctober 13, 2024
From start to finish, the process involved building the massive X factory where the GPUs would reside and equipping the entire factory with liquid cooling and power to make all 200,000 GPUs operational. That's not to mention all of the coordination between Nvidia's and Elon Musk's engineering teams to get all of the hardware and infrastructure shipped and installed precisely and in a coordinated manner.
For perspective, Huang states that it takes an average data center four years to do what Elon Musk and his team were able to do in 19 days. Three years of that time alone would be dedicated to planning, while the last year would be used to ship the equipment, install it, and get it all working.
Huang also goes into detail describing how complex the networking is on Nvidia's hardware. He explains that networking Nvidia's gear isn't like networking traditional data center servers. "The number of wires that goes in one node...the back of a computer is all wires."
Elon Musk's integration of 100,000 H200 GPUs has "never been done before" (according to Jensen Huang) and probably won't be duplicated again by another company, at least not for a very long time.