Igalia, the free software consultancy perhaps best known for its work on the Raspberry Pi's GPU, has revealed that it is investigating NUMA (Non-Uniform Memory Access) emulation for ARM64 devices. The investigations have so far yielded a potential and significant performance uplift for the Raspberry Pi 5, discussed on a Linux kernel list via a message from Tvrtko Ursulin.
The patch details were posted to the mailing list, and it appears to be around 100 lines in length. However, those 100 lines potentially have a big impact on the Raspberry Pi 5 and many other ARM64 devices.
According to the post. "This series adds a very simple NUMA emulation implementation and enables selecting it on arm64 platforms."
This improves single-core performance by 6% and multi-core performance by approximately 18%. These figures were determined using Geekbench 6 test runs.
Ursulin explains in a little more depth: "[...] splitting the physical RAM into chunks and utilizing an allocation policy such as interleaving can enable the BCM2721 memory controller to better utilize parallelism in physical memory chip organization."
What could this mean for the Raspberry Pi 5? Overall better performance from an already performant 2.4 GHz Arm CPU, which can be easily overclocked to 3 GHz or more.
The code is out for review, and with a little luck and hard work from the Linux Kernel developers, this patch could add even more performance to the Raspberry Pi 5 and many other ARM64 devices.
NUMA emulation, mainly used in systems with multiple processors, is a computer memory design where memory access times depend on the memory location that is relative to a processor. In simple terms, NUMA allows each CPU to have its own bank of locally attached memory while still having access to the memory directly connected to other processors in the system. This results in fast latency for 'near' memory (locally attached) but slightly slower latency for 'far' memory (memory directly attached to other processors in the system).
The Linux Kernel documentation page goes into NUMA with a little more depth when it comes to the Linux software stack. "Linux divides the system’s hardware resources into multiple software abstractions called “nodes.” Linux maps the nodes onto the physical cells of the hardware platform, abstracting away some of the details for some architectures. As with physical cells, software nodes may contain 0 or more CPUs, memory and/or IO buses. And, again, memory accesses to memory on “closer” nodes–nodes that map to closer cells–will generally experience faster access times and higher effective bandwidth than accesses to more remote cells."
The patch claims, "Code is quite simple and new functionality can be enabled using the new NUMA_EMULATION Kconfig option and then at runtime using the existing (shared with other platforms) numa=fake=
We'll investigate this and see if we can reproduce Igalia's results.