Skip to content Skip to navigation

Creating a Static Copy of a Website

The modern Web is a dynamic place. However, sometimes it's necessary (or desirable) to remove the dynamic functionality of a website, while preserving its static content.

Inspired in part by Karen Stevenson's excellent blog post, "Sending a Drupal Site into Retirement," I wanted to outline a few other techniques for accomplishing this.

Reasons you may want to create a static copy of a site:

  • The site runs on an outdated version of dynamic web software
  • The site has been hacked, but its content is still relevant
  • The site's content has lost its immediacy, but may need to be revived in the future as a dynamic website
  • The site was built in 2004 in ColdFusion by a vendor that has flown the coop (oops)

Method One: wget

Wget is a cross-platform command-line program for retrieving web pages. It's almost like it was built to do this.

Run the following code to crawl www.example.com and save it as flat files to an arbitrary directory of your choosing (noted by /path/to/destination/directory):  wget -P /path/to/destination/directory/ -mpck --user-agent="" -e robots=off --wait 1 -E https://www.example.com/

See this code explained on explainshell

More Information for the Stanford Web Environment

If you have a Drupal, WordPress, or MediaWiki site hosted on the Stanford WWW servers (AKA "AFS"), you can use the wget method to create a static copy of your site in cgi-bin.

Assuming you have a site at http://ponies.stanford.edu and it lives at /afs/ir/group/ponies/cgi-bin/drupal.

  1. SSH into corn.stanford.edu
  2. Run the following command:
     wget -P /afs/ir/group/ponies/WWW/ -mpck --user-agent="" -e robots=off --wait 1 -E http://ponies.stanford.edu/
  3. Visit http://www.stanford.edu/group/ponies/ponies.stanford.edu in a browser; you should have a full copy of your production site
  4. You may have to do some cleanup of the HTML code, and may want to rename the directory using the following command:
    mv /afs/ir/group/ponies/WWW/ponies.stanford.edu /afs/ir/group/ponies/WWW/static
  5. Once you've checked everything out and it looks good, you can submit a Virtual Host change request so that ponies.stanford.edu points at www.stanford.edu/group/ponies/static
  6. If you want to then delete the dynamic site, submit a HelpSU request.
  7. Note: if the site is a Drupal site, you may want to disable CSS and Javascript aggregation, so that wget will grab the original source versions of those files

Method Two: Drupal's "Disable All Forms" Module

If it's a Drupal site, you can use the Disable All Forms Module. This module does exactly what it says: it disables all forms. Using it requires Bad Judgement (sic).

This method works well if you may want to revive the Drupal site at some point in the future, but don't want to deal with spammers and other malcontents.

Method Three: WordPress Plugin

There are a variety of WordPress plugins to create a static copy of a WordPress site.

See Also

Categories: 

Comments

Thank you for this page. With your code (and a bunch of trial and error), I'm finally getting my old Wordpress site downloaded. I was having trouble getting past the front page until I pointed wget at the archives page, and now it's happily Hoovering down all of it.

As a North Canton, Ohio, native, I'm always happy to see "Hoover" used as an eponym.

Just came across this. Good info. Ran this on a WordPress site and a Joomla! site  and got a nice set of flat files. Not tried on Drupal yet but will.