Get all your news in one place.
100’s of premium titles.
One app.
Start reading
Tom’s Hardware
Tom’s Hardware
Technology
Anton Shilov

Nvidia and Oracle team up for Zettascale cluster: Available with up to 131,072 Blackwell GPUs

Oracle.

Oracle on Wednesday introduced new types of clusters set to be available for AI training through Oracle Cloud Infrastructure (OCI). The most powerful cluster will be based on Nvidia's upcoming on Blackwell GPUs and will offer up to 2.4 ZettaFLOPS of AI performance, making it even more powerful than Elon Musk's recently announced AI clusters.

Oracle's new supercomputer clusters can be configured with Nvidia's Hopper or Blackwell GPUs for AI and HPC as well as different networking gear, including ultra-low latency RoCEv2 with ConnectX-7 NICs and ConnectX-8 SuperNICs or Nvidia's Quantum-2 InfiniBand-based networks, and a choice of HPC storage, depending on performance needs:

  • OCI Superclusters equipped with H100 GPUs can support up to 16,384 GPUs, offering a peak performance of 65 FP8/INT8 exaFLOPS and a combined network throughput of 13 Pb/s (13 petabits per second).
  • H200 GPU-powered OCI Superclusters, launching later this year, will scale up to 65,536 GPUs, delivering up to 260 FP8/INT8 exaFLOPS and 52 Pb/s in network throughput.  
  • Finally, OCI Superclusters based on Blackwell B200 GPUs will scale up to 131,072 GPUs and will offer peak performance of up to 2.4 FP8/INT8 zettaFLOPS.

OCI's upcoming supercomputing clusters far exceed the capabilities of current leading systems. The range-topping B200-based OCI Superclusters feature over three times more GPUs than the Frontier supercomputer (which uses 37,888 AMD Instinct MI250X GPUs) and six times more than other hyperscalers, according to Oracle.

"We have one of the broadest AI infrastructure offerings and are supporting customers that are running some of the most demanding AI workloads in the cloud," said Mahesh Thiagarajan, executive vice president, Oracle Cloud Infrastructure. "With Oracle's distributed cloud, customers have the flexibility to deploy cloud and AI services wherever they choose while preserving the highest levels of data and AI sovereignty."

Several companies are already benefiting from this advanced infrastructure. WideLabs and Zoom are leveraging OCI's high-performance AI infrastructure to accelerate their AI development while maintaining sovereignty controls.

"As businesses, researchers and nations race to innovate using AI, access to powerful computing clusters and AI software is critical," said Ian Buck, vice president of Hyperscale and High Performance Computing at Nvidia. "Nvidia's full-stack AI computing platform on Oracles broadly distributed cloud will deliver AI compute capabilities at unprecedented scale to advance AI efforts globally and help organizations everywhere accelerate research, development and deployment."

The upcoming OCI Superclusters will use Nvidia's GB200 NVL72 liquid-cooled cabinets with 72 GPUs that communicate with each other at an aggregate bandwidth of 129.6 TB/s in a single NVLink domain. Oracle said that Nvidia's Blackwell GPUs will be available in the first half of 2025 (as availability of Blackwell this year will be limited), though it is unclear when OCI will offer fully loaded Blackwell-powered clusters.

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.