GPU shader cores — called CUDA cores in Nvidia parlance — and ROPs are important aspects of modern GPUs. With it's upcomig RTX 50-series, it appears Nvidia has focused on the former rather than both. Harukaze5719 reports on X (formally Twitter) that Nvidia's upcoming Blackwell RTX 50-series GPUs will only see CUDA core count improvements over the Ada Lovelace RTX 40-series GPUs, with ROPs staying the same on the various tiers. The only exception is the entry-level GB207 die, which will get a whopping 33% reduction in ROPS count.
ROPs, or Render Output units (also Raster Operations Pipelines), play a vital role in the traditional GPU 3D rendering pipeline. As the name implies, these handle the processing of pixel and texel information, or in other words, rasterization workloads. ROPs generally aren't as important as shader cores, but they still play a key role in the GPU pipeline. You want to scale the number of ROPs relative to the number of shader cores and other processing clusters to provide optimal performance.
So then might be like this one?All number based on kopite7kimi and formula isn't change ( 1 GPC = 1 ROPs / 1 TPC = 2 SM / SM = 128 CUDA) https://t.co/158neeR86i pic.twitter.com/xmuvANTXi1June 11, 2024
Harukaze's new information (which is based on a formula from popular leaker Kopite7kimi) suggests that Nvidia won't be adding more render output units to its gaming-oriented variant of the Blackwell GPU architecture. From the presumably mainstream GB206 all the way up to the flagship GB202 die, the various GPUs will supposedly sport the exact same ROPS count as their Ada Lovelace (RTX 40-series) predecessors. GB207, the only exception, will reportedly take things a step further in trimming ROPS counts and will have a 33% reduction compared to AD107.
It might seem strange for Nvidia to not increase ROPS count, but very likely the company architects think there are enough ROPs already for Blackwell. As previously mentioned, ROPS aren't the be-all and end-all of GPU performance, especially on modern workloads that incorporate ray tracing, upscaling, and other effects. More ROPS doesn't necessitate more performance if the architecture becomes unbalanced. Nvidia could also be improving the individual ROPS performance in Blackwell, which would provide another explanation for the rumored changes.
Take GB207's 33% ROPS nerf. Nvidia's outgoing AD107 GPU die has an identical ROPS count to the slightly larger and thus more expensive AD106 die. But despite this seeming advantage, AD107-based GPUs never managed to compete with AD106-based GPUs. As our RTX 4060 review showed, the AD107 equipped RTX 4060 card comes nowhere near the RTX 4060 Ti in gaming performance. The key differences between the two are the CUDA core counts and other processing cores (RT, tensor, and texture).
Perhaps AD107 was "overspecced" and Nvidia will cut the ROPS count with GB207, potentially making for a bigger gap to GB206. It also appears Nvidia will be cutting the CUDA core count to just 2,560 — less than the 3,072 on the RTX 4060. The GB206 meanwhile has up to 4,608 shaders, the same number as AD106 (but RTX 4060 Ti only had 4,342 cores enabled). These changes will most likely make make for a bigger gap between the GB207 and GB206 parts.
Speaking of CUDA cores, Nvidia will supposedly have up to 24,576 shaders (192 SMs — Streaming Multiprocessors) on its top GB202 die. That will also have a 512-bit memory interface, which when coupled to GDDR7 could provide a massive boost to memory bandwidth. GB203 on the other hand will be similar to the current AD103, with up to 84 SMs and 10,752 shaders compared to 80 SMs and 10,240 CUDA cores on AD103, and also the same 256-bit interface (but with GDDR7 support). That makes for an absolutely massive gulf between the potential RTX 5090 and RTX 5080, if these rumors prove correct.
Going down the stack, GB205 replaces AD104, but where AD104 had up to 60 SMs and 7,680 shaders, the new chip will apparently max out with 50 SMs and 6,400 shaders — and again, stick to the same 192-bit memory interface. GB206 will retain the same 36 SMs and 4,608 CUDA core count as its AD106 predecessor, with a 128-bit interface. And last and least, the GB207 die will only offer 20 SMs and 2,560 CUDA cores, with a 128-bit GDDR6 memory interface.
It hopefully goes without saying, but readers should take all of the provided information with a huge serving of salt. This unofficial data might come from a leak, or it could simply be rumor mongers spitballing various ideas based on what makes sense. Nvidia will release the first two RTX 50-series GPUs toward the end of the year, according to current rumors, but the last three dies won't come out until 2025. That leaves plenty of time for changes and further speculation. We haven't heard about the consumer Blackwell architectural changes either, though it's a safe bet there will be upgraded CUDA, Tensor, and RT cores — and potentially changes to the ROPS and other elements as well.
One thing is certain, though: If Nvidia really does plan on a 512-bit memory interface and up to 192 SMs with the top GB202 solution, that will not come cheap. Ultimate performance, lots of power, and a shark-sized bite out of your bank account.