Get all your news in one place.

100's of premium titles.
One app.

Get all your news in one place.

100's of premium titles. One news app.

The Conversation

Toby Murray, Associate Professor of Cybersecurity, School of Computing and Information Systems, The University of Melbourne

A Meta outage hit Facebook, Instagram, WhatsApp and more. Here’s what we know so far

Meta Facebook Instagram WhatsApp The Conversation United States

A major outage is affecting users of popular social media and messaging services including Facebook, Instagram and WhatsApp around the globe. All these platforms are run by the social media giant Meta.

As news of the outage spread, we learned that it affected almost all of Meta’s products, including Messenger and Threads, as well as Meta’s business products, such as Facebook Ads Manager and the Messenger API for Instagram.

Most services are beginning to come back online. But what went wrong, and what can we learn from this massive outage?

The scope of the outage

Outages have been reported from the United Kingdom to Canada to the United States and beyond.

The outage was first reported in the US on Wednesday (around 12.30pm in New York, 5.30pm in London, or 4.30am Thursday in Sydney).

Five hours later, Meta posted to X to say it was 99% of the way to resolving the outage.

What might have caused it?

At the moment, there has been no official word on the cause of the outage. However, we can make some educated guesses based on its scope.

From reporting so far, the outage covered not only Meta’s major social media platforms and messaging services, but also some of its business products. It also affected Meta’s Login with Facebook service, which allows users to log in to third-party sites using their Facebook username and password.

Screenshot of a web page with a list of business tools showing statuses such as 'Major disruption' or 'Resolved'. — Meta’s business product status page showed outages in several services. Meta

In other words, there seem to be very few Meta products this outage did not impact.

That suggests that whatever went wrong was a single point of failure: something relied upon by all of Meta’s services, without which the services can’t function.

Design for reliability

These kinds of outages are rare. That’s because major internet platforms are designed to be highly reliable.

The main way reliability is achieved is through replication. When you visit Instagram, for example, your computer connects to a server that sends back your Instagram feed. In fact, Instagram content is not stored on just one computer but is replicated across a massive array of computers known as a content delivery network (or CDN).

Practically all major web platforms, including news sites such as The Conversation, large companies, and online services such as YouTube and Google, use content delivery networks to increase the reliability and efficiency of their websites.

The idea behind a content delivery network is that if one computer in the network has a problem, another can take over in its place. This is what makes the networks reliable.

Content delivery networks also help when websites are under heavy demand. If many people are trying to request the same content, those requests can be spread out between many computers in the network, allowing each to be handled efficiently.

The widespread nature of Meta’s outage suggests it might have happened in a part of Meta’s systems that wasn’t replicated. However, we’ll have to wait for word from Meta on the causes before we will know for sure.

Lessons to be learned

Meta’s outage comes in the wake of the major outage caused earlier this year by CrowdStrike’s Falcon security software. Falcon’s design meant it was deeply entangled with Microsoft Windows. That made Falcon a single point of failure so that, when it crashed, it brought down Windows as well – in spectacular fashion.

A key lesson from this outage was that invasive security software such as Falcon should be re-engineered to operate at arm’s length of Windows. This idea is known as fault isolation, which says that systems should be built as a collection of separate components so that if one component fails it cannot cause the entire system to fail.

This is the reason why modern ships are designed to have multiple internal compartments, with mechanisms to try to make each compartment watertight. That way, if the ship’s hull is breached, water cannot flood the entire ship.

Meta’s outage is a timely reminder of the need to engineer critical systems to maximise their reliability, including minimising central points of failure and employing engineering principles like fault isolation.

Looking ahead

In the meantime, the precise cause of Meta’s outage remains to be determined.

Many people all over the world rely on Meta’s services. These include businesses using Instagram as their primary platform for engaging customers online, or merchants using Facebook Marketplace as a key revenue stream. For many families, WhatsApp has become an indispensable way to keep in contact, especially during times of crisis.

We can only hope Meta will be forthcoming about the causes of this outage and the measures it will put in place to make sure it cannot happen again.

Toby Murray has previously received funding from Facebook. He is the director of the Defence Science Institute, which receives state and Commonwealth funding.

This article was originally published on The Conversation. Read the original article.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here