We knew Nvidia’s new GH200 Grace Hopper processor was fast, but the first benchmarking test results reveal exactly how fast it is. GPTshop.ai, which has built an incredibly powerful desktop computer based on the Grace Hopper processor, provided Phoronix with access to the chip so it could benchmark it.
NVIDIA's GH200 is a powerful combination of the 72-core Grace CPU and the H100 Tensor Core GPU. It supports up to 480GB of LPDDR5 memory and either 96GB of HBM3 or 144GB of HBM3e memory. The Grace CPU is built on Arm Neoverse-V2 cores, each with 1MB of L2 cache and a total of 117MB of L3 cache.
The NVIDIA GH200 runs standard AArch64 Linux distributions. For testing purposes, Phoronix used Ubuntu 23.10 with Linux 6.5, providing a leading-edge look at the NVIDIA GH200 Linux performance against other Intel Xeon Scalable, AMD EPYC, and Ampere Altra Max processors.
CPU performance
The GPTshop.ai GH200 system was tested with 72 cores, a Quanta S74G motherboard, 480GB of RAM, and 960GB + 1920GB SAMSUNG SSD drives. All server processors tested were running at their top-rated memory frequencies and maximum number of memory channels supported.
The initial benchmarks focused on CPU performance, with GPU benchmarks to follow. Unfortunately, there are no power consumption numbers yet, as the NVIDIA GH200 doesn't currently expose any interface under Linux for reading GH200 power/energy use. However, the initial raw CPU performance benchmark numbers are promising, showcasing the NVIDIA GH200 as an beast in the processor arena.
You can view all of the test results on Phoronix’s site here, but a mean of all the results can be seen in this chart below. Although the EPYC 9754 came out on top by some way, the GH200 processor was first in some of the tests.
Summing things up, Phoronix says: “On a geo mean basis across all the benchmarks conducted, the GH200 Grace CPU performance nearly matched the Intel Xeon Platinum 8592+ Emerald Rapids processor. The Arm Neoverse-V2 based Grace CPU tended to be much faster than the 128-core Ampere Altra Max AArch64 server. It will be interesting to see how AmpereOne can compete albeit no hardware available yet for testing. (Unfortunately no AMD MI300A hardware either for testing right now.) The NVIDIA ARM CPU performance has certainly come a long way from benchmarking the NVIDIA Tegra early days for ARM performance.”
More of the CPU benchmark numbers are available via this result file. There's also some other benchmarks here from some of the preliminary testing.