Get all your news in one place.

100’s of premium titles.
One app.

Start reading

Get all your news in one place.

100’s of premium titles. One news app.

Start reading

Tom’s Hardware

Technology

Aaron Klotz

AMD MI300X posts fastest ever Geekbench 6 OpenCL score — 19% faster than RTX 4090, and only eight times as expensive

AMD Nvidia

AMD's fire-breathing MI300X GPU has made its official debut on Geekbench 6 OpenCL, outpacing previous chart-toppers such as the RTX 4090. However, despite being one of the fastest GPUs on the Geekbench 6 charts, the AMD GPU's score does not reflect its real performance and shows why it's a terrible idea to benchmark data center AI GPUs using consumer grade OpenCL applications (which is what Geekbench 6 is).

Let's get the benchmark numbers out of the way, though. The MI300X boasts a score of 379,660 points in Geekbench 6.3.0's GPU-focused OpenCL benchmark, making it the fastest GPU on the Geekbench browser to date. (Note that it's not listed on the official OpenCL results page yet.) That gives it the pole position, ahead of the second highest score that goes to, ironically, another enterprise GPU, the Nvidia L40S. The L40S managed 352,507, which in turn beats the RTX 4090's 319,583 result by 10%.

So, the MI300X beats all contenders right now, outpacing the RTX 4090 (the fastest consumer GPU on the list) by 60,077 points or 18.8%. Clearly, other factors are holding back some of these GPUs, as Nvidia's H100 PCIe also shows up on the list with a meager score of only 281,868. Don't use Geekbench 6 OpenCL as a measuring stick for enterprise-grade hardware, in other words. It's like driving a Formula One car in a school zone to check the acceleration and handling of the vehicle.

We should also discuss price. RTX 4090 is easy enough to pin down, with a $1,599 MSRP and a current lowest price of $1,739.99 online. AMD MI300X is a different story, as you generally buy those with servers and support contracts. However, a quick search gives a suggested price of anywhere from $10,000 to $20,000 per GPU — we can't say for certain how accurate that data is, as the companies actually buying and selling the hardware generally don't reveal such information, but obviously the MI300X plays in a completely different league than consumer hardware. You also can't just run out and buy an MI300X to slot into your standard desktop PC; you'll need a server with OCP accelerator module support.

You only need to look at the rankings to quickly see that all may not be right with this benchmark. The RTX 4090 outpaces the RTX 4080 Super by 28% — not an entirely out of the question result, but that's using the same architecture. The RTX 4080 Super meanwhile beats AMD's top consumer GPU, the RX 7900 XTX, by 21%. If this were a ray tracing or AI performance test, that wouldn't be out of the question, but in general FP32 compute performance the 7900 XTX tends to be far closer than these results would suggest. And again, that's not even looking at the often terrible results from data center GPUs like the MI300X and H100.

Spec-wise, the AMD MI300X GPU should be in a league of its own. It has 192GB of HBM3 memory with 5.3 TB/s of bandwidth, paired with 304 CDNA3 Compute Units (CUs) and 163.4 TFLOPS of FP32 performance. And that's not even its strong suit. As an AI GPU, it also boasts 2.6 petaflops of FP16 throughput, as well as 2,600 TOPS of inference performance — sort of puts the 40 TOPS requirement of Copilot+ to shame. The MI300X also comes with severe power requirements to match, with a peak 750W power rating.

The MI300X is AMD's latest enterprise GPU, designed to compete with the likes of Nvidia's H100 and H200 AI GPUs. The GPU takes advantage of AMD's CDNA 3 graphics architecture and heavily utilizes 3D-stacking technologies. In fact, the GPU itself is so large that it does not come in a traditional PCIe graphics card form factor. In proper AI-based benchmarks, the MI300X is purportedly up to 60% faster than Nvidia's H100, let alone the RTX 4090.

By contrast, the RTX 4090 is barely half as powerful as the AMD chip in FP32 perforamnce. It features 24GB of GDDR6X and 1TB/s of memory bandwidth, 128 SMs with 82.6 TFLOPS of FP32 compute, and 1,321 TOPS of AI performance. Power consumption is also substantially lower at 450W.

The MI300X's Geekbench 6 debut reveals just how poor such a test is for measuring higher performance GPUs. Sure, sometimes the results aren't terribly out of whack, but OpenCL driver optimizations alone likely account for a large amount of the potential performance. The test can run on a wide range of hardware — Qualcomm's Snapdragon X Elite as an example posts a score of 23,493 — but it's clearly not tuned for all potential workloads. Like most synthetic benchmarks, it only looks at a very narrow slice of the potential performance on tap.

And that's fine, just as long as people looking at the benchmarks know what they mean. We're pretty certain the MI300X result is more just someone with access to AMD's MI300X having some fun seeing what would happen on Geekbench 6, rather than a serious effort to evaluate the GPU. We can't wait to see how 1.2 million GPUs in a supercomputer cluster rate in the same test.