Skip to content Skip to navigation

Help! I lost everything! What do I do? Introducing the Internet Archive

Sometimes things happen that are beyond your realm of control. A page in your website or maybe your whole site goes missing. Then, to add insult to injury, the backups can’t restore the site. What can you do to recover?

Introducing Internet Archive

Take heart my friend, all may not be lost. You may not be able to restore the site, but there might be a record of its content at the Internet Archive (archive.org). According to Internet Archive,

“The Internet Archive is working to prevent the Internet - a new medium with major historical significance - and other "born-digital" materials from disappearing into the past.”

“The Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. “

How do I use Internet Archive?

Using the Web collection, it is possible to find archives of your site. These archives are in “flat” HTML format. Only the rendered pages are archived; that is, Internet Archive only saves what a visitor to your site can see.

See archives of your site

To see archives of your site, navigate to the “Wayback Machine” at https://archive.org/web/, and enter the URL for your site.

This will take you to a calendar page which identifies the dates this site has been saved.

If you click on a blue circle in the calendar, you should be able to navigate through the archive of your site as it looked at that date.

Capture a web page now

To capture your site at the current moment, the WayBackMachine provides a “Save Page Now” option at http://archive.org/web/.  According to Internet Archive, this will “Capture a web page as it appears now for use as a trusted citation in the future.”

Archive your site

Here are some other ways to to help get your site included on an on-going bases in the Wayback Machine:

  • Have other sites link to it and include it in online directories

  • Make sure your site’s “robots.txt” rules allow crawlers on your site

  • Make sure your site is publicly accessible, that is, you don't need a password to see the site

Limitations

The Internet Archive only saves the rendered pages from your site. This includes HTML, CSS, and JavaScript. If you are using a content management system with a database such as Drupal or Wordpress, neither the database, nor any code on the server, such as PHP, is archived.

An archive will not always contain the original site's functionality. According to Internet Archive, when a page requires interaction “with the originating host, the archive will not contain the original site's functionality.”

Check it Out!

Now that you know more about Internet Archive, you might want to visit this site and see previous versions of websites, and to save a web page or two. Hopefully, you will never need to recover content from a missing site, but it’s good to know its there should things get beyond your control. Check it out at http://archive.org/web/.

 

Categories: 
Tags: 

Comments

I want to point out that the terms of use for Internet Archive do not cover backups for the general public. However, you may use the Internet Archive Wayback Machine to locate and access archived versions of a site to which you own the rights. The terms of use specify that users of the Wayback Machine are not to copy data from the collection.

You can access the cached version for any page that has been saved by Google with this: http://webcache.googleusercontent.com/search?q=cache:http://example.com/ Change http://example.com/ to any URL.