Get all your news in one place.
100’s of premium titles.
One app.
Start reading
TechRadar
TechRadar
Craig Hale

Cloudflare says a bad update broke its logging systems, made it lose data

Homepage of CloudFlare website on the display of PC, url - CloudFlare.com.

  • Cloudflare confirms update made it lose customer log data
  • The incident lasted 3.5 hours in total, leading to a 55% loss of logs
  • Despite a five-minute fix, the bug caused knock-on issues

Cloudflare has confirmed a bad software update caused it to lose log data for its customers recently. The incident, which lasted around 3.5 hours, resulted in more than half (55%) of logs being lost.

Embarrassed that the error occurred, the California company apologized to customers in a blog post, promising a similar issue should not happen again.

Cloudflare also noted that failures within systems at scale are inevitable, but subsystems should be built to protect themselves in the event of wider issues.

Cloudflare admits to losing data logs

The problem originated with Cloudflare’s Logpush service, which bundles and sends logs from its global network to customers for compliance, debugging and analytics. A routine update to support a new data set ended up misconfiguring the service, causing the issue.

The company says a configuration bug effectively told one of its internal servers, Logfwdr, that none of its customers had configured logs to be sent, leading to the loss. Although engineers identified and fixed the bug within five minutes, the issue triggered a deeper bug.

A built-in fail-safe, which sends logs to all customers rather than just those with active Logpush jobs, ended up overwhelming the system. The buffering system, Buftee, had to manage 40 times its usual capacity, rendering the system unresponsive.

“We accept that mistakes and misconfigurations are inevitable. All our systems at Cloudflare need to respond to these predictably and gracefully," the company wrote.

Looking ahead, Cloudflare has committed to conducting regular overload tests to simulate this error, providing confidence that its systems can handle future bugs of a similar nature.

You might also like

Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.