On July 27, Microsoft released a detailed security report on the cause of the CrowdStrike crash that triggered one of the biggest IT outages in history.
Microsoft's report came just a few days after CrowdStrike's post-incident report. Both investigations concluded the same thing: the outage that impacted millions of Windows devices was caused by a bugged driver.
The CrowdStrike outage was effectively triggered by Channel File 291, a file containing problematic data, incorrectly passing validation through the bugged driver, the "Content Validator," part of CrowdStrike's Content Configuration System.
The problematic data in Channel File 291 triggered an out-of-bounds memory read, which led to the crash. An out-of-bounds memory read occurs when a program tries to access data that's out of reach beyond the end of some type of buffer. For example, if a program tried to access data beyond the end of an index, that could cause an out-of-bounds memory read error.
In this case, the error resulted in the infamous Blue Screen of Death (BSOD) Windows operating system crash on July 19, which impacted millions of devices worldwide.
While the outage has mostly been resolved as of this writing, the results of Microsoft and CrowdStrike's investigations could have a more long-lasting impact on everyday users. The way antivirus and anti-cheat apps work might be changing soon.
The role of kernel-level access in the CrowdStrike outage
Part of the underlying cause of the CrowdStrike outage was that CrowdStrike's software requires kernel-level access, like many other antivirus programs.
Kernel mode is the deepest level of the Windows operating system. It's often used in cybersecurity software since it can scan for malware more deeply, and kernel-level programs are more complex for hackers to disable.
By operating on the kernel level, antivirus programs can monitor all the activity on a device to cast the widest net for identifying suspicious activity or files.
For example, the driver involved in the CrowdStrike outage was a file system filter driver. This type of driver is prevalent in antivirus programs and typically monitors new files saved to a device. They can also monitor system behavior, which appears to be the case with the "Content Validator" involved in the CrowdStrike outage.
Unfortunately, the downside of allowing a program to run on such a deep level in the Windows operating system is a higher risk of system crashes if a glitch does slip through.
Microsoft explains in its incident report, "Since kernel drivers run at the most trusted level of Windows, where containment and recovery capabilities are constrained by nature, security vendors must carefully balance needs like visibility and tamper resistance with the risk of operating within kernel mode."
How the CrowdStrike outage could impact kernel-level apps for security and gaming
Microsoft's full report on the CrowdStrike outage is pretty lengthy, but one of the most important sections is at the end, where Microsoft mentions "reducing the need for kernel drivers to access important security data" moving forward.
This is important to note since CrowdStrike is far from the only developer to require kernel-level access for its software. Many consumer cybersecurity apps and anti-cheat programs also require this deep access into users' operating systems.
In recent years, kernel-level anti-cheat software has faced significant controversy in the gaming community. Many gamers see it as a privacy and security risk since hackers could get kernel-level access to their devices if these anti-cheat programs were ever compromised. Hackers have even found ways to bypass kernel-level anti-cheat programs.
However, game developers seem reluctant to abandon kernel-level anti-cheat programs due to the growing difficulty of stopping cheating in competitive games.
The CrowdStrike outage may mark a turning point in this issue since it sheds new light on the potential dangers of kernel-level programs. This incident is evidence that gamers may be right to be concerned about the safety of their devices with kernel-level anti-cheat. It also highlights the drawbacks of kernel-level consumer cybersecurity apps.
We could see some of these apps move away from kernel-level access. Microsoft may begin putting more research and development into finding alternative ways to protect users' devices (and stop gamers from cheating) without needing kernel-level permissions.
AI could offer one potential solution. For example, developers have suggested using AI "Human Behavior Detection" to spot cheating in competitive gaming. This approach relies on identifying suspicious behavior in-game rather than scanning every file on a user's device for potential cheating software.
Could similar AI-powered solutions provide alternatives to kernel-level cybersecurity software? That's unclear, but AI will likely play a major role in Microsoft's research efforts in the aftermath of the CrowdStrike outage.
The CrowdStrike outage may have been the most high-profile IT issue caused by a kernel-level program, but it's certainly not the first time kernel-level errors have led to BSOD crashes for users. Kernel-level software has benefits, but the risks are clearly significant. Users need an alternative that can keep their Windows devices safe without the risk of critical system crashes.