The Internet Archive
The Internet Archive is the greatest repository of information we have available. You could spend your whole life going through it and not make any progress.
“The opportunity before all of us is living up to the dream of the Library of Alexandria and then taking it a step further – universal access to all knowledge. Interestingly, it is now technically doable.” - Brewster Kahle
The Library of Alexandria had one critical flaw that it couldn't do anything about: it didn't exist during the Internet Age. Instead, the Great Library existed more than 2,200 years ago. It was founded during the reign of Ptolemy II Philadelphus, son of Ptolemy I Soter, who founded the Ptolemaic Kingdom after the death of Alexander the Great. While it's not known how many works were stored there, the number is thought to be close to half a million at its height. Who destroyed it? The answer isn't that important; what's done is done, and what's gone is gone. The best we can do is ensure that something like that never happens again.
Something like that was probably in the mind of Brewster Kahle when he started the Internet Archive, a modern-day Library of Alexandria (and as it calls the Internet its home, it's larger than the legendary library could ever hope to have been). You could spend the rest of your life enjoying everything that the Internet Archive (IA from here on) has to offer and still barely scratch the surface. Are you looking for a silent film? A public domain book? An old recording? Some footage of World War II? A web page from way back? A Grateful Dead concert? You'll find all of that and more.
The IA was founded on May 12, 1996, just a month after Brewster Kahle founded Alexa (with co-founder, Bruce Gilliat). Popularly, Alexa has given us the Alexa Rank, which estimates the popularity of the website based on the traffic it gets. The name itself came from the Library of Alexandria and soon after its founding, its operations began to resemble those of the IA - crawling the web and archiving what it came across (the company still crawls and supplies the IA with its results). Just three years later, Kahle & Gilliat sold Alexa to Amazon for $250 million. After that, Kahle focused his attention on the IA.
But the goal of universal access to human knowledge is in many ways an original goal of the Net. It's a tremendous goal. It makes me want to jump out of bed in the morning and try to get this thing done. People working on digital divide issues want to join in, advocates for children's literacy programs want to join in. It's not about driving slick cars, it's about using this technology for the betterment of education and people. I'll take that any day over random stock option grants. - Brewster Kahle
Archiving
As the latter part of the name implies, archiving is the main focus of the IA. Archives are where data that is no longer being used (typically) is kept for storage. This can be done for the sake of posterity, retrieval, or record keeping. Archiving is practiced by everyone to some extent. Are there emails or texts that you haven't deleted yet? Documents that you keep just in case? Things that you've kept in storage, digital or physical? Photo albums that you want to show your kids? If so, then you've archived something. It's a common thing, as natural as breathing. The IA does what everyone does, but more so.
It's a difficult problem to solve. What is the medium you will store it in? How do you account for the degradation of the medium and data? How do you ensure everything is backed up - locally as well as off-site? How do you safely navigate the legal hurdles involved with copyrighted material? How do you ensure everything is easily retrievable? How do you systematically store information? What processes do you have in place to ensure that, should something happen, the data can be quickly placed back online without too much disturbance or hassle? How do you make sure people don't get tangled in an obscure web of data?
Many of those questions - and more - likely don't have a singular (or even satisfactory) answer, but it's imperative to continue nonetheless. One important reason is that legal cases use links as evidence more often. As stated in the Harvard blog, "link rot is... a growing issue". To counteract this, Harvard Library launched perma.cc, a service that allows legal, academic, and other professions to save an archive of a page. It's a useful way to create a permanent record; although unlike the Wayback Machine, you have to save the page manually.
Links aren't the only things that degrade & rot, though. Pages of a book, hard drives where photos are stored, physical film, memory - and virtually all other mediums - are subject to ravages of time. Everything degrades. Unless it's copied to a different format (and depending on the lifespan of the format, recopied often) there's no way for it to persist. It's why archiving in the digital age is closer to the ideal than it was in ages past - a 1:1 copy can be created and spawn many more, ensuring longevity and persistence. Do you want to read the very first edition of Dracula, all the way from 1897? You can - easily. And, if the IA achieves its goal, someone from 2897 will be able to read the exact same thing you're able to right now. The power of archiving is ensuring that the original is never forgotten or overwritten by successful successors; it will always be there, at your beck & call.
Retrieval
Archiving is only one-half of the process. Retrieval is the other half. What good is information that can't be used later on? Even in the case of posterity, one plans to bring the information back to show future generations. The IA makes much of what it archives available (some things it can't due to copyright restrictions). Without organizations like the IA, the 20th & 21st centuries would end up being cultural black holes to future generations - inaccessible due to the stringent copyright restrictions placed on newly created film, literature, music, and more.
Being able to access the past with a few clicks & keystrokes is a newfound superpower. It used to be the case that you'd need the help of a librarian to find something (and if it was obscure, best of luck). Now, even the most obscure references can be found with the proper search terms. Being able to retrieve information thought lost (because let's face it, no single person can go through what is stored in some archives) equips the retriever to do great things - you can bring it to the fore, modify it for your purpose, or give it new life in some other way.
That isn't overstating it, either. Take vampires as an example. While the myth has existed in several cultures for centuries, they hit the popular market thanks to three events:
- The publishing of Dracula in 1897 (setting the stage for the modern myth)
- The release of Nosferatu in 1922 (bringing it to the masses beyond what reading alone could do)
- The public domain status of both of those properties, allowing hundreds of retellings & modifications (providing an endless stream of material for future generations as well)
Being able to retrieve works of old and adapt them to the modern era not only keep those works visible & popular but also give them new life. Dracula wouldn't have existed if Stoker wasn't aware of vampire myths, which only persisted because they were circulated over hundreds of years. It's not too far off to say that the modern world only exists because humans can keep records and pass them down to future generations.
Stats
I love the era of dreams. - Brewster Kahle
Here's a quick rundown on some stats lifted from Archive.org:
- 720,000,000,000 (that's 720 billion) archived web pages.
- 35,000,000 (35 million) texts - documents, books, papers, etc.
- 8,300,000 (8.3 million) videos - from military footage to feature films.
- 14,200,000 (14 million) pieces of audio - music, recordings, sounds, etc.
- 2,400,000 (2.4 million) episodes of television, such as news, late night, and more.
- 874,000 (874 thousand) pieces in the Software Collection, providing access to who-knows-how-many programs long forgotten.
- 4,400,000 (4.4 million) images to peruse and use.
- 239,000 (239 thousand) videos and sounds of live music and concert footage.
The material - and what you can do with it - ranges the gamut. You can find a public domain recording of some music you like and set it as the background to a silent film. The power of use & reuse is one of the greatest creative outlets we have - and is probably the reason myths, stories, and legends made their way to the modern age. A story in its original form can be forgotten if enough time elapses, but if it's important enough, it'll persist through tellings & retellings over the ages. The IA, however, is well equipped to solve that problem. We now live in a world where the originals - the first forms (using "first" in a very liberal sense, as there's nothing new under the sun) of stories - can be accessed well into the future.
Interesting Links
- An interview between Richard Koman & Brewster Kahle about how the Wayback Machine worked in 2002.
- A short profile of Kahle & the IA written by Paul Boutin for Slate in 2005.
- "An Impressionistic Transcript of Brewster Kahle's talk at Wikimania..."
- An article by Brewster Kahle arguing against a settlement in a case that would give Google a veritable monopoly over orphan works.
- Lend Ho! - Forbes Article
- This article is more than 7 years old Internet Archive founder turns to new information storage device - the book - The Guardian
- The Cobweb by Jill Lepore