Get all your news in one place.
100's of premium titles.
One app.
Start reading
Tom’s Hardware
Tom’s Hardware
Technology
Mark Tyson

Website backup crippled by 1.6MB Friends GIF that was replicated 246,173 times, breaking Linux's EXT4 filesystem limit — Jennifer Aniston's 'happy dance' animation ate up 377 gigabytes of data due to security policy

Friends episode.

A single tiny GIF, frequently used in chats by a site's community members, ended up adding 377GB to the website's backup quota, breaking its Linux filesystem and causing the backup process to fail. The Jennifer Aniston ‘happy dance’ reaction GIF weighs in at 1.6MB, and in the headlining case, it was duplicated 246,173 times in the backup, writes Discourse tech blogger Jake Goldsborough. This problem was precipitated by, dare we say, an overuse of the happy dance GIF, plus a file security policy implementation. Fixing it wasn’t entirely straightforward.

Discourse is a company and open‑source software project that builds one of the most widely used modern community‑discussion platforms, currently powering over 22,000 online communities. Its real-time chat platform allows users to insert emojis and GIFs in their discussions to liven up debates. But the platform’s ‘secure uploads’ feature means that “when a file moves between security contexts (say, from a private message to a public post), the system creates a new copy with a randomized SHA1,” explains Goldsborough. “The original content is identical, but Discourse treats it as a new file.” So, a popular image or reaction GIF will spread across posts, reposts, and PMs, and each context creates another file copy.

Discourse’s first attempt at a fix for the system being swamped by duplicates was to track original content by its hash. Then, during backup, group uploads by the hash and download only the first file in each group. Hardlinks were created for any duplicates.

No one told them life was gonna be this way

This seemed like an elegant solution until one of Discourse’s larger customers made everyone aware of the ext4 limit of roughly 65,000 hardlinks per inode. In the headlining case, the backup worked with this first fix, but “instead of one download for all 246,173 duplicates, we got one download plus ~181,000 fallback downloads after hitting the limit,” explains the firm’s blog. “Not the win I expected.”

One of Discourse’s other customers had 432GB of uploads and a correspondingly hefty backup. However, analysis indicated that the unique content was just 26GB. In other words, duplicates were behind a 16x inflation factor.

The absurdly duplicated file that created 377GB of bloat was Rachel from Friends doing her happy dance. So, the problematic site was obviously quite a happy one, with the reaction GIF “used constantly in posts, PMs, everywhere,” noted Discourse.

Also happily, Discourse managed to fathom a fix for its earlier fix. In effect, this new fix begins like the old one, by creating hardlinks. But when the filesystem throws up an EMLINK error message (too many hardlinks), it will copy the file locally and treat the new file as ‘primary’ until it reaches the limit again. This new measure “works on any filesystem, no configuration needed,” says Discourse, with some satisfaction.

Discourse ends by highlighting the lessons learned from its confounding animated GIF duplication frenzy, wryly observing that “now I know Jennifer Aniston can stress-test infrastructure.”

Sign up to read this article
Read news from 100's of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.