The next generations of PCIe are becoming so demanding that Intel is now designing techniques to reduce the bus speed, or even the width of the PCIe link, to prevent devices from overheating. Intel has been developing a Linux PCIe bandwidth controller driver designed to keep thermals in check since last year, reports Phoronix. That work includes plumbing in new mechanisms for PCIe 6.0.
The source of increasing temperatures for PCIe devices is pretty simple—the devices run faster to saturate the faster bus, thus generating more heat. As the PCIe bus gets faster, it becomes more demanding of signal integrity and less tolerant of signal loss, which is often combated by improving encoding or increasing clocks and power, with the latter two creating extra heat.
The driver's near-term function for PCIe 5.0 is to mitigate thermal issues by reducing PCIe link speeds to keep temperatures in check — this means the bus itself will downshift from its standard 16 GHz frequency to slower speeds to keep heat in the safe zone. This feature ensures the devices can maintain optimal temperatures even under high loads. While the current focus is on controlling the link speed, plans are underway to extend the functionality to manage PCIe link widths (i.e., the number of active PCIe links), which the PCIe 6.0 specification will enable. For instance, a PCIe x16 device could shift down to a x8 or x4 connection to control thermals.
The introduction of PCI 6.0 could present a serious thermal challenge, particularly for GPU servers that use hundreds of PCIe links simultaneously. "This series adds PCIe bandwidth controller (bwctrl) and associated PCIe cooling driver to the thermal core side for limiting PCIe Link Speed due to thermal reasons," Intel's description of the driver reads. "PCIe bandwidth controller is a PCI express bus port service driver. A cooling device is created for each port the service driver finds if they support changing speeds. This series only adds support for controlling PCIe Link Speed. Controlling PCIe Link Width might also be useful but AFAIK, there is no mechanism for that until PCIe 6.0 (L0p) so Link Width throttling is not added by this series."
While Intel's commitment to improving server thermal controls is understandable, how it will be implemented remains to be seen. Intel could use data from thermal sensors in PCIe hosts, endpoints, and retimers provided via standardized interfaces.
Recently, the fifth set of patches for this driver was released, indicating refinements and optimizations in the code, such as refactoring and clean-ups. This ongoing development reflects Intel's intention to enhance the Linux kernel's capability to handle thermal management with newer and faster PCI Express versions, including 6.0 and 7.0.
Although not yet complete, the latest updates of the driver show promising progress toward integration into the mainline kernel, according to Phoronix. How this will affect performance for AI training and HPC servers remains to be seen, but the capability gives Intel, server makers, and data center admins another way to manage server power consumption and heat dissipation.