The rapid expansion of generative AI models require more powerful hardware and with the rise of datacenters with hundreds of thousands of AI GPUs, they are quickly pushing the limits of current datacenter infrastructure and soon they could hit the limit of power grid. While AWS, Microsoft, and Oracle plan to use nuclear power plants to power their datacenters, Microsoft Azure's chief technology officer, Mark Russinovich, suggests that connecting multiple datacenters may soon be necessary to train advanced AI models, reports Semafor.
Modern AI datacenters, such as those built by Elon Musk's companies Tesla or xAI, can house 100,000 of Nvidia H100 or H200 GPUs and as American giants are competing to train the industry's best AI models, they are going to need even more AI processors that work in concert as a unified system. As a consequence, datacenters are becoming even more power hungry both due to the increased number of processors, higher power consumption of these processors, and the amount of power that is required for their cooling. As a result, datacenters consuming multiple gigawatts of power could soon become real. But the U.S. energy grid is already under strain, especially during periods of high demand, such as in hot summer days, there are concerns that the grid may not be able to keep up with the demand.
To address these challenges, Microsoft is making significant investments in energy infrastructure. Recently the company signed a deal to reopen the Three Mile Island nuclear power plant to secure a more stable energy supply and before threat the company invested tens of billions of dollars in development of AI infrastructure development. But that may not be enough and at some point huge companies will have to connect multiple datacenters to train their most sophisticated models, says Microsoft Azure CTO.
"I think it is inevitable, especially when you get to the kind of scale that these things are getting to," Russinovich told Semafor. "In some cases, that might be the only feasible way to train them is to go across datacenters, or even across regions. […] I do not think we are too far away."
On paper, this approach would address the growing strain on power grids and overcome technical challenges associated with centralized AI training. However, this strategy comes with major technical challenges, particularly in ensuring that datacenters remain synchronized and maintain high communication speeds required for effective AI training.
The communication between thousands of AI processors within a single datacenter is already a challenge, and spreading this process across multiple sites only adds complexity. Advances in fiber optic technology have made long-distance data transmission faster, but managing this across multiple locations remains a significant hurdle. To mitigate these issues, Russinovich suggests that datacenters in a distributed system would need to be relatively close to one another. Also, implementing this multi-datacenter approach would require collaboration across multiple teams within Microsoft and its partner OpenAI, which means that decentralized AI training methods must be developed within Microsoft.
There is a catch about decentralized AI training methods. Once developed, they offer a potential solution for reducing reliance on the most advanced GPUs and large-scale datacenters. This could lower the barrier to entry for smaller companies and individuals looking to train AI models without the need for massive computational resources. Interestingly, but Chinese researchers have already used decentralized methods to train their AI models across multiple datacenters. However, the details are scarce.