Phrases like “What’s on the internet is forever” have become commonplace, but lately that might be becoming less and less accurate, as Google is preparing to stop backing up cached pages.
A cached page is a digital copy of a web page that’s stored in an individual user’s or organization’s temporary storage (such as short-term session storage). This essentially functions like a backup or ‘snapshot’ of these websites, and Google Search would make these backups as it combed the web for search results.
Google has been creating and storing different types of cached pages for pretty much as long as Google Search has existed, and Google’s collection of cached pages is sometimes called a “backup of the Internet.” Its practice of caching websites has been fundamental to Google’s search infrastructure since the start, and these cached links allow people to be able to see at least a past version of the site. This can be handy if the web page has gone offline for some reason, or if there have been substantial changes made.
The implications and thinking beyond Google
That’s now going to change, and Google will cease caching websites, according to Google’s official “Search Liaison” Danny Sulivan, as explained in his lengthy X (formerly Twitter) post.
He writes:
“Yes, it's been removed. I know, it's sad. I'm sad too. It's one of our oldest features. But it was meant for helping people access pages when way back, you often couldn't depend on a page loading. These days, things have greatly improved. So, it was decided to retire it.”
Hey, catching up. Yes, it's been removed. I know, it's sad. I'm sad too. It's one of our oldest features. But it was meant for helping people access pages when way back, you often couldn't depend on a page loading. These days, things have greatly improved. So, it was decided to…February 1, 2024
He also goes on to suggest that Google might make use of the Internet Archive’s library of snapshots of web pages in the About This Result section of future Google Search results. We can infer that he probably means the Internet Archive’s Wayback Machine, its digital archive of the World Wide Web. He praised the Internet Archive for what it does and clarifies that Google will have to reach out to the Internet Archive to make this happen.
The Wayback Machine is somewhat similar to Google’s collection of cached pages, except a good deal smaller and younger than Google Search. While Sulivan is probably on to something, it’s going to be a challenge for Google to resolve if it wants to keep up the credibility of its search results.
The Wayback Machine is quite extensive, but it’s not as extensive as Google’s collection of cached information. Also, the Internet Archive is a nonprofit organization with more limited resources than the likes of Google, and it prioritizes more popular and significant websites. Additionally, it would be relinquishing a great deal of control over the archival process and passing over a lot of that responsibility to the Internet Archive.
The wider implications and how you can cache a page for yourself
Cached pages have been disappearing and reappearing for some users since December 2023, and according to Ars Technica, as of February 2, 2024, it couldn’t locate any Google Search cache links.
It does point out that for now, you can create your own cache links through one of the following ways:
1. Type "https://webcache.googleusercontent.com/search?q=cache:" plus the website URL of the website you’d like to cache (without the quotation marks)
2. Enter “cache:” (without quotation marks) plus a URL straight into Google Search.
Ars Technica also reports that Google has taken down all of the support pages about cached websites.
Before this change, every result that came up in a Google Search would have cached links that could be found in the drop-down menu next to it. A Google web crawler would continuously scour the web to try and find updated pages (to make sure its search results were up to date), and in that process, it would save the version of the page that it came across at the moment it found it.
It’s easy to imagine how all this storing of historical websites could pile up to immense amounts of data, especially as internet use and use of Google search in particular have grown over the past few decades. This might be a move to cut data storage costs, as getting rid of its vast collection of cached pages will probably free a lot of storage space.
If this is a cost-cutting move, then it will be a real shame, as Google’s caching of the internet has been a vital part of documenting and preserving the ever-evolving World Wide Web, which has not just been useful for researchers and students, but for anyone who wanted to revisit sites and pages that have otherwise disappeared.