If the lawyers have their way, the history of the Internet could disappear. It's time for someone to do something about it.
The World Wide Web, the thing you use every day for information, entertainment, and utility is arguably roughly 40 years old. Prior to 1993/94, there was no easy way to view and access the Internet's voluminous, often university-and-academia-based information. The first web browsers and then the explosion in the use of Netscape, Spyglass, and Internet Explorers not only opened up the existing libraries but promoted the creation of millions of new information sources, followed quickly by entertainment, news, and shopping.
Throughout most of that history, the non-profit Internet Archive has been there, first capturing the web itself and ultimately packaging it as the Wayback Machine and soon sucking in all kinds of digital content, acting as a sentinel for our binary thoughts and experiences.
It's like the fully digital counterpart to the US Government's Library of Congress which has spent much of its history archiving physical content in the form of books, music, movies, and artifacts.
The two often work together, and the Internet Archive also gets funding support from the government, but only one of them is being sued on two fronts by lawyers from major book publishers and the RIAA (Recording Industry Association of America) who insist the Internet Archive is nothing more than a mass copyright infringer.
Infringement versus preservation and access
No one is claiming that the Internet Archive is profiting from offering vast digital library of hard-to-find books, magazines, and old music. But by making these bits of content searchable and fully available to the public, the content owners insist the Internet Archive is, as one lawsuit puts it, exploiting "sound recordings without authorization, neither Plaintiffs nor their artists see a dime. Not only does this harm Plaintiffs and the artists or their heirs by depriving them of compensation, but it undermines the value of music.”
The Internet Archive often argues fair use and, in the case of physical books that it makes available, it imposes lending limits that sound no different than a physical library.
I understand protecting authors and music artists from infringement and loss of funds, though the content the Internet Archive offers is typically not, for instance, the current top 50 in music or on The New York Times' current Best Sellers list. In the case of the RIAA's most recent suit, it relates to music by mostly dead musicians and that was originally distributed on 78rpm albums.
The Internet Archive stands a fair chance of losing these and probably other legal battles. On the face of it, yes, the IA is probably infringing, even if it makes no money from the library and is only seeking to ensure access to those who don't, say, have libraries or access to these books and to ensure the preservation and access to music that won't typically be played on your radio or be easily findable on your favorite streaming services. But by sharing this often hard-to-find content, it also serves as a leveler of playing fields. Should only those who can afford to pay be able to read classic works and hear old but iconic music?
A dark path
What happens, I wonder, when the Internet Archive loses these cases and is forced to remove the content and possibly pay damages? Certainly, the success in one case has led to the other and will surely lead to more. Soon every Intellectual Property holder will be going after the Internet Archive and soon it will succumb and disappear.
That's my fear and it should be yours, too.
You may not have noticed that those who create much of the online content you read are always precious about preserving it. In fact, in some instances, media companies will purposely purge old content in an effort to help them rise in search results. Basically game the Google system.
Imagine someone walking into a library and arbitrarily choosing whole shelves to discard because it assumed they were no longer relevant. This happens every day online.
Google doesn't care. It will surface what's relevant and is happiest when people follow its ever-changing SEO rules. The depth of your archives is less important than how you craft content to elevate it on the search results page.
Plus, Google cares little about the history of the Internet. A few years ago, you could find deleted web pages by clicking on Google's cached version of that page. Google did away with that feature and now if a page is gone on someone's server, it's also permanently gone on Google.
Going way back
The Internet Archive, though, is the one place that's preserving the past, and that can give us a clear image of the Internet as it was and the stories we wrote. It is a historical record. Much of it is preserved in the aforementioned Wayback Machine. You can put in any domain (living or dead) and then cruise through thousands of live-ish web pages going back to the mid-1990s. The Wayback Machine didn't capture every page, every hour, every day. Instead, it has snapshots, but it's something and more than enough when archives of early versions of CNN, PCMag.com, sites like Suck, and, more recently some of CNet, have been wiped off the Internet.
That history is now at risk.
My thinking is that frequent partner The Library of Congress can put a stop to this by buying the Internet Archive. Make it a government body and protect all that digital content. I think these lawyers may be less inclined to sue the US Federal government.
The LOC already preserves Tweets (at least it did, I have no idea what it's doing about Xes) because of their historical significance. Are websites and how we wrote and talked online about the world around us any less important? Clearly, the Internet Archive's other work with books, music, photos, and Web videos dovetails neatly with the Library of Congress' core objectives. Acquiring the Internet Archive and saving it all just makes sense.
This needs to happen now before the Internet Archive loses these suits and it's just too late.