A bug that sometimes caused boot times on AMD Zen 1 and Zen 2 systems to swell to several minutes was patched today for the Linux 6.13 kernel (via Phoronix).
While Linux can take a while to boot on old hardware (nearly five days for the ancient Intel 4004), it’s not an operating system known to be particularly slow. Even an optimization made in August that shaved off just 0.035 seconds from the boot time was considered to be noteworthy, as Linux is already a highly optimized OS.
However, one Nokia employee noticed four weeks ago that more than 10 AMD servers running Zen 1-based Epyc CPUs were taking a while to start up.
“Normally, that trace [a step of the booting process] would be at about 12 seconds with only 1-2 seconds variation across boots. But when applying the mentioned patch, the variation increases,” the Nokia engineer wrote in an email to an AMD employee and the Linux kernel team. “Most boots see no impact, on some boots the time is increased by a few to tens of seconds, and in extreme cases even by several minutes (!).”
The engineer had also determined the issue came down to a change that was added to Linux 6.11 back in May 2023. Called “load late on both threads,” this was supposed to address microcode updates for AMD CPUs that have simultaneous multi-threading (SMT), which is essentially every Zen-based CPU since 2017. SMT gives each core two threads, hence the “both threads” part of the patch.
According to the AMD employee who wrote the “both threads” code, Linux originally checked whether a given microcode update was fine to apply on just one thread or needed to be applied to both threads. However, their patch removed this check, which meant microcode updates would always be applied to both threads going forward.
However, the Nokia engineer argued that microcode updates would be successfully applied to one thread, and then applied to the other even though this was unnecessary and caused boot times to go up dramatically.
“It is claimed that the added late loading does no harm on any CPU newer than Bulldozer. Yet, based on my observations, I think this statement may be incorrect,” the Nokia employee wrote in an email to the AMD engineer who wrote the “both threads” patch. Not even a full month later, the engineer submitted a patch to the 6.13-rc1 kernel today that fixes the issue by flushing microcode updates out of the CPU’s memory buffer, preventing the update from going through a second time.
According to Phoronix, this patch should also be backported to previous stable releases of the Linux kernel, allowing for distros based on pre-6.13 kernels to receive the fix for slow boot times on AMD’s early Zen cores.
Considering the problem was only noticed this month, despite existing for over a year, it’s probable that it didn’t impact very many users or organizations. That’s not surprising since the original Zen CPUs debuted in 2017, and Zen 2 chips arrived in 2019; at this point, very few computers are still running these relatively old CPUs.