The Intel-powered Aurora supercomputer was widely expected to take the top spot from the AMD-powered Frontier, the #1 supercomputer on the Top500 list, but it took second place instead. However, Aurora did take the top spot in the AI-centric HPL-MxP mixed-precision benchmark, allowing Intel to lay claim to powering the fastest AI supercomputer in the world with 10.6 AI Exaflops of performance.
It's noteworthy that Aurora is still not fully operational, so the entire system wasn't used for any of the benchmark submissions. Aurora remains beset by numerous hardware issues like hardware and cooling system failures, operational errors, and network instability, among others (details in the last section below). The continued issues are a bit surprising—the system was first announced nine years ago, the second revision was announced five years ago (the first version was canceled), and the final components were installed eleven months ago.
The system houses 21,248 CPUs and 63,744 GPUs spread across 10,624 compute blades, but Argonne National Laboratory (ANL), which hosts the system, was again unable to submit a full Linpack run for the Top500 list.
Instead, Aurora placed second with 1.012 Exaflops, breaking the Exaflop barrier with 87% of the system active (9,234 of the full 10,624 nodes). This solidifies Aurora's second-place position — Aurora's first submission (with only half the system) also took second place, reaching 585.34 petaflops six months ago.
Aurora is supposed to be faster than Frontier in the High-Performance Linpack (HP benchmark and thus take the lead in the Top500 upon completion, but it's clear the system will need more tuning to live up to its billing. Frontier is ~19% faster than Aurora with 1.206 exaflops of performance, and, assuming linear scaling, Aurora still wouldn't win after adding the remaining 13% of nodes that weren't used for the Top500 benchmark run.
Intel has ballyhooed Aurora's theoretical peak performance of 2 exaflops (Rpeak), but supercomputers are measured by sustained performance (Rmax). Frontier delivers 70% of its peak as sustained performance in Linpack, while Aurora only delivers 51% of its peak. This should hopefully improve over time, and Aurora would easily take the top spot if it delivered a similar 70% of its peak performance (~1.4 exaflops) during sustained workloads.
I asked ANL if Aurora is expected to take the lead over Frontier in the Top500 upon completion. "There's a contractual target number that is faster than Frontier," a representative responded. "So, if we're successful in reaching that number, we'll be faster than Frontier." Notably, the statement says Aurora should beat Frontier, not that it will. We've followed up for a firm confirmation of the actual performance target.
Aurora took first place in the HPL-MxP mixed-precision benchmark with 10.6 exaflops of AI performance with only 89% of the Aurora system active. This benchmark prioritizes lower precision (FP32 and lower, even FP16) than the FP64 used for the Linpack benchmark used for the Top500 ranking. Thus, this benchmark better represents AI workloads and an increasing number of other real-world applications — FP64 is largely relegated to traditional scientific computing, and some argue it is a shrinking portion of that segment, too.
HPL-MxP is becoming much more important to model real-world performance in the age of AI, but Aurora's position at the top will be hotly contested. There has yet to be a submission from a large-scale Nvidia Grace-Hopper-powered system to the leaderboard. The Alps supercomputer, which now promises 20 exaflops of AI performance, is slated to have all of its 10,752 Grace Hopper processors installed by the end of June 2024, so competition for the leadership spot is on the way.
The High Performance Conjugate Gradients (HPCG) benchmark is also designed to be more representative of real workload applications than Linpack. Aurora performed impressively in this benchmark as well, taking the #3 ranking with a mere 38.5% of the supercomputer active. Aurora also took fifth in the Graph500 benchmark, which is designed to measure performance in data-intensive applications, but ANL didn't specify how much of the system was active for this benchmark run.
Aurora hasn't placed in the Green500, a list of the most power-efficient supercomputers, and that isn't surprising. Aurora will consume up to 60 MW of peak power, slightly more than double Frontier's 29 MW, but we don't know how its final performance will look. It isn't clear if Aurora can beat Frontier in Linpack performance, but even if it does win, it will be by a small amount—certainly not enough to justify the increased power consumption for that particular workload. However, there are plenty of other applications that operate at lower precisions, and power efficiency comparisons will vary by application. Regardless, Nvidia's Grace Hopper systems now comprise five of the top ten systems on the Green500, so it appears that Nvidia has both Intel and AMD beat in the power efficiency department.
Aurora facing hardware failures, cooling system malfunctions, among other problems
Ten long months passed between the final Aurora hardware being installed and when ANL submitted its benchmarks, raising questions about the source of the continued delay in standing up the full machine. We followed up with Intel on the matter.
“[...]Since we completed the physical delivery of the last compute node at the end of June 2023 (only 10 months ago), we have been working hand-in-hand with Argonne National Laboratory and HPE to fully stabilize and tune the system, including the compute nodes, storage system, fabric, power delivery, and cooling."
"We are also actively working on addressing stability issues like hardware failures, software bugs, cooling system malfunctions, issues with power supply, networking infrastructure stability, environmental factors, and operational errors,” the Intel representative said to Tom's Hardware.
Argonne National Laboratories and Intel have yet to provide a firm date for when they expect the system to be fully operational, but we do know that Aurora's window to take the lead in the Top500 is closing. The AMD-powered El Capitan, rated for two exaflops of peak throughput (not sustained), is largely expected to beat Aurora and Frontier in Linpack. Lawrence Livermore Labs submitted early results for sub-scale models of El Capitan today, and the system is expected to be completely installed by the end of 2024.