Skip to: Navigation | Content | Sidebar | Footer


Weblog Entry

Web Server Backup

September 22, 2005

Get a full local backup of your remote web server with some basic command-line interaction and rsync. Bonus for OS X users: clickable icon backup goodness.

This is one of those handy tricks I discovered way too late, which some of you may not already know.

Problem: You have a web server located somewhere not physically close to you. You use FTP to send and receive files. You’re generally okay with this setup, except for one little chink in the armor: backup. Even if you don’t run remote scripts which generate files on the server (I’m looking at Movable Type here) which you never remember to backup, sooner or later your local copy and the server will lose synch.

Solution: How about a way to backup a perfect copy of the remote server, incrementally, so that each new update only downloads the files that have changed (and not the whole multi-gigabyte site)? It’s as great as it sounds.

Caveat: Though your local computer can run any OS, this only works if the server itself is Unix-based, and you have shell access. If your site runs on IIS, it won’t work. If your host doesn’t provide you with a shell account, it won’t work. In theory, your shell login should be the same as your FTP account, but not necessarily. You may want to get in touch with your host to verify your settings.

Warning: The most important things you should pay attention to are the various path settings. If you get them wrong, and somehow end up moving files to or from the wrong spot, data could become corrupted awfully quickly. The first time you run this, make sure you also have an alternate method of recovering from data loss. Just in case.

Basic Necessities

To pull this off, we need to dip into some Unix hackery, which is a bit scary for those of us used to the cushy buttons and checkboxes of a GUI. If you’re on OS X or Linux, you’ve already got everything you need. Open up the Terminal in the former, or the command line in the latter. (If you’re using Linux, presumably you already know how to get a command line and I don’t have to explain this further, not that I could anyway.)

If you’re on Windows, you’re going to need some extra software, namely something called an “rsync client”. Though it’s probably overkill, grab Cygwin for now — which is a command line environment that comes with a set of powerful tools, all very much like what you get in a Unix-based OS — and you’ll get rsync with it. Install, then run Cygwin and you should be taken to a Unix-like command line.

Finding Your Local Backup Directory

So we should all be on the same page at this point, with a command prompt greeting us (shown below). If you already know how to get to your backup directory on the command line, skip ahead to the header “Running rsync”.

A basic command prompt.

Now we want to find the directory that will house our backed-up site. This can be anywhere on your local system, and getting to it is going to depend largely on your computer’s configuration. In my case, I have a partition on my hard drive called ‘Shine’, which is mounted as a separate volume. This is the equivalent to calling a partition the G: drive in Windows. So let’s begin at the root (otherwise known as /) of our system by issuing the “change directory” command: cd /.

A command prompt after running the cd / command

We can take a look at what’s in the root by issuing the “list” command: ls.

Directory listing after the ls command was issued

Where exactly to go from here depends on your OS; on a Mac, partitions are stored under /volumes. Under Cygwin on Windows, the User Guide should help you figure out where you need to go. So if we’re working on a Mac, let’s change the directory to /volumes and take a look at what’s in it using the ls command again:

Issuing the ls command lists two directories, Sparkle and Shine

On my system we see two volumes, Sparkle and Shine, which correspond with my local partitions. I’m going to skip the ensuing directory drill-down to find my ultimate destination, but by continuing to use cd and ls to navigate your file system, find the directory you’ll be storing your backup in. (You can either create it ahead of time with the file manager in your OS, or use the Unix mkdir command once you’re in the parent directory.) Your prompt will likely be the current path, if not you can display the path by invoking the pwd command:

The working directory is /Volumes/Shine/Personal/mb-backup

Running rsync

Now we’re ready. I’ll cut to the chase and just show you right now what you’re going to be typing (more or less), and explain it afterward:

Let’s break it down piece-by-piece.

rsync - the program name itself, this is just causing it to run.

-aze - these are three options we’re specifying. a sets archive mode, which does things like preserve permissions and use relative paths. z compresses file data to speed up the transfer. e allows us to connect to a remote server. There are more options available, but these are the essential ones for what we’re trying to accomplish.

ssh - ssh, or secure shell, is a method of securely connecting to a remote server. The previous e option told rsync that we wanted to do so, and ssh is the protocol we’re going to use to do it.

username@ - this is your username on the remote server. Again, this may be similar to your FTP program’s login, or it may not. You’ll want to contact your host if you don’t know what your shell login is.

67.19.16.228 - this is the IP address of your web server. You likely won’t be able to just enter yourdomain.com here, so using your IP address is the best bet. However, that’s a pain when you don’t have a static IP, so alternatively this can also be the name of your host’s server. I can use aristotle.multipattern.com in place of an IP address, for example.

:/home/username/public_html/ - this is the full server path to the root of the directory you want to back up. Note the preceding colon, this is important for separating the IP address from the server path. By full server path, I mean you need to know where your site sits within the filesystem of the remote server. You might be able to find this with your FTP program by continuing to navigate up in the hierarchy until you can go no further; then simply chain together the resulting directories you navigated through until you get a full path back down the hierarchy to your web site’s root. Otherwise, you may need to contact your host for the full path.

. - and finally, an important trailing space followed by a single period. This indicates the current local path, which is where we navigated to earlier. Alternatively you could skip the initial step of finding this on the command line and use an absolute path here instead of a period, ie. /Volumes/Shine/Personal/mb-backup.

At this point, if you have the correct data entered, you should be ready to go. Hit return, and if the server is found, it will prompt you for your password. Enter it, then wait. The first sync will take quite a while.

If everything is working properly, it will appear that nothing is happening; when rsync has finished synchronizing, the command prompt will simply pop up again with no message one way or another, and you’ll be able to view the results by issuing an ls command. If you don’t see your entire remote server’s contents now on your local hard drive, something has gone wrong. (For some reason on OS X, I get a message informing me that “stdin: is not a tty”. It doesn’t seem to affect the backup though, and everything else runs as expected.)

Aliasing your Backup

That’s about it if you don’t mind entering the command manually every time you want to backup. But you can also create an alias or a shell script for the entire command that will make life a little easier. In this case, make sure to use the full absolute path on your local server instead of the period, so that the scripts are callable from anywhere.

Aliasing involves opening up your shell user profile. There are a bunch of different Unix shells, bash being a more common one. Each will have its own profile naming scheme. In bash, this is .bash_profile, and creating an alias means adding a line like this with your own settings: (make sure it’s all on one line)

alias backup='rsync -aze ssh 
	username@67.19.16.228:/home/username/public_html/ 
	/Volumes/Shine/Personal/mb-backup'

The user profile file itself is stored in your home directory, which is most likely the directory that loads when you first open up the command line — if not, you can get to it with the command cd ~. It may be difficult to open a file with a preceding period in Windows (if Cygwin even uses this format); unfortunately I can’t really be of much more help here, so the User Manual is once again your friend.

Assuming you’ve managed to create the alias, you can now invoke the backup simply by typing backup on the command line.

OS X Shell Script

We can take it one step further in OS X though, and create a clickable icon for the backup. This involves opening a text editor and creating a new text file, which we’ll save as a shell script. Enter the following as the contents of the file, replacing with your own settings where appropriate:

#!/bin/bash
rsync -aze ssh 
	username@67.19.16.228:/home/username/public_html/ 
	/Volumes/Shine/Personal/mb-backup

The latter three lines are identical to the command-line we generated earlier, and should all be on one line. Save this file wherever you want it, but make sure to give it a “.command” extension. Also very important, make sure that the line break formats are Unix, not Macintosh or DOS.

Once you have this file saved, you’ll need to make sure you have executable permissions on the file. Open up the Terminal again and find the directory you’ve saved it in, then issue this command:

chmod 744 filename.command

The very last step will probably be necessary, depending on your system configuration. In the Finder, right-click (Ctrl-click if you have to) on the file and select “Get Info”. In the “Open with” menu, select Terminal from the list. Close the dialogue, and you’re done.

Now whenever you wish to backup your server, all you need to do is double-click the icon and enter your password. If it’s not working as expected, check out this tutorial on executable scripts for more help.

Finally, if this simple set of Unix commands is brand new to you, you may also wish to look into the ability of ssh to lock down your mail, especially if you use a wireless internet connection of any kind.

There’s gold in the Unix command line. It’s worth learning.


1
Tom says:
September 22, 02h

What do you know about this utility:

http://www.apple.com/downloads/macosx/system_disk_utilities/rsyncx.html

September 22, 03h

I wrote an article on using rsync with Strongspace on Windows, that can be readily adapted to any SSH server.

http://www.antidis.com/articles/2005/08/windows-rsync/

Dave, you should look at generating SSH keys to avoid having to type your password at each login. The guide for Strongspace is here:

http://www.strongspace.com/weblog/tips-and-tricks/using-ssh-keys-for-a-quick-sftp-login

…but it’s essentially the same for every Unix SSH server.

September 22, 03h

Wow, thankyou. Your article has just saved me a heap of effort, thanks again :)

Dave S. says:
September 22, 04h

“What do you know about this utility:”

RsyncX is great for local system backups to external drives, I use it all the time. I think I heard once that you’re only going to want to be running it on OS X systems though, due to resource forks, but I’m not sure about the specifics. I’ve never got it working remotely, anyway.

5
Calrion says:
September 22, 05h

What a great article! I agree with everything except the Cygwin part–I wouldn’t recommend running it unless you have to, Windows doesn’t play very nice with it.

Oh, and you can get your current path in UNIX by using the ‘pwd’ command. Just type pwd at the prompt and it will tell you your current, full filesystem path.

neil says:
September 22, 08h

It’s great to see more people plugging into rsync and how excellent it is. One thing I haven’t been able to figure out is how to do an rsync operation like this, but add in stuff that should be skipped and *not* backed up.

Is there any way to do this outside of creating a copy of the item locally and changing its permissions?

September 22, 08h

Thanks for sharing this. But what about the more important data from the database? How to solve this?

Dave S. says:
September 22, 09h

“But what about the more important data from the database?”

If it’s MySQL, there are plenty of ways. PHPMyAdmin has an option to export databases. Your host’s control panel might too. If nothing else, you can ssh in to your server and use mysqldump. (I touched on the latter briefly at the end of this article a few months ago – http://www.mezzoblue.com/archives/2005/05/19/security_no/ )

September 22, 09h

If you are put off by the lack of feedback as rsync is working you can add the “–progress” option on the command line (note that it can’t appear between the “e” and the “ssh”). Alternatively you could change “-aze ssh” to “-azPe ssh”.

If you don’t want to type your password, you can also generate an SSH key on the command line.

“ssh-keygen -t rsa” will create a keypair in the “.ssh” directory within your home folder. It will ask if you’d like to add a passphrase to your key. If you don’t want a passphrase then you can just press return (but read the warning below).

The keypair is stored in two files: “id_rsa” (the private part) and “id_rsa.pub” (the public part). Now, you need to copy the public part onto your remote machine. It should be added to the file “authorized_keys” in the remote “.ssh” directory (create it if it doesn’t exist).

If you don’t have an authorized_keys file, then you can simply copy “id_rsa.pub” as that file. The following command will do it.

scp ~/.ssh/id_rsa.pub username@remotemachine:.ssh/authorized_keys

(you’ll need to enter your remote password, hopefully for the last time).

Now, about SSH key passphrases. If you didn’t add a passphrase to your key, then it means anyone with the private part of the key (the id_rsa file) can use the key and potentially gain access to the remote machine. If you can guarantee that file is safe (using FileVault, never letting anyone touch your machine, etc) then that’s ok. If you did add a passphrase, then each time you use the key you’ll be asked for the passphrase. This is just as annoying as having to type your password all the time. Fortunately, there are a couple of OSX tools that handle this for you (SSH Keychain and SSH Agent). It’s up to you. Either way, your private key is valuable and you should protect it.

paulo says:
September 22, 09h

Excellent primer on rsync. I love that little tool!

I like to keep my production sites as clean as possible so when I sync from staging to production I throw a “–delete” in there to remove files on production that are not needed anymore:
rsync -azE –delete /path/on/staging/ /path/on/remote

Just a note – I believe you have to have a copy of rsync that supports “-E” to use that option. -E helps handle the MacOS metadata so some Linux/Unix hosts might not have rsync patched. Conversely if you are not syncing to/from a Mac then you can omit it. I am not a sysadmin so if anyone can confirm that rsync needs to be configured on a Linux/Unix to support MacOS MD that would be great. I use OS X Tiger and my server are updated to handle OS X MD so I always add -E.

A quick and easy way to back up a whole mysql DB is:
mysqldump -ulogin -ppass –opt dbName > dbName.sql

Just SSH to your server, run the command and dl/copy your .sql file for safe keeping.

hth

September 23, 02h

Just a few more tips, lifted straight from my ./scripts directory. As I don’t run a Mac, YMMV, but this should work on any *nix variant.

Firstly, I run a slightly different command, as follows:

rsync -azr –delete -e ssh username@yourserver:/home/username/public_html/ /home/username/backups/

The -r is for recursive directory traversal, and the –delete has already been mentioned in comments above.

I also like to backup my databases at the same time by adding the following to the same script:

rsync -azr –delete -e ssh username@yourserver:/var/lib/mysql/ /home/username/backups/mysql

I then have a script set up to run nightly to copy such files to my computer, but this means that if the database somehow gets corrupt just before this time, the backup will also be corrupt. So I run multiple scripts in multiple directories, to keep various snapshots of my files taken from different times - nightly, weekly and monthly. (NOTE: Running automated scripts over SSH requires keypair as described by Dean Jackson, above)

So my scripts would now look something like this:

#nighly.sh
rsync -azr –delete -e ssh username@yourserver:/home/username/public_html/ /home/username/backups/nightly
rsync -azr –delete -e ssh username@yourserver:/var/lib/mysql/ /home/username/backups/nightly/mysql

#weekly.sh
rsync -azr –delete -e ssh username@yourserver:/home/username/public_html/ /home/username/backups/weekly
rsync -azr –delete -e ssh username@yourserver:/var/lib/mysql/ /home/username/backups/weekly/mysql

#monthly.sh
rsync -azr –delete -e ssh username@yourserver:/home/username/public_html/ /home/username/backups/monthly
rsync -azr –delete -e ssh username@yourserver:/var/lib/mysql/ /home/username/backups/monthly/mysql

Just add such scripts to your crontab, to run at the appropriate times, and you should be backed up fairly well, with 3 backups to cover you an any particular time.

40 5 * * * /home/username/backups/nightly.sh
30 5 * * 0 /home/username/backups/weekly.sh
20 5 1 * * /home/username/backups/monthly.sh

Hope this helps a few people.
Adam.

12
Malcolm says:
September 23, 04h

Just a boring note: ‘-e ssh’ has been the default in rsync since version 2.6.0, so you don’t need to specify it.

September 23, 04h

Interesting… I’ve used rsync to keep my dev and live sites in sync and to atomically launch new features.

But I never really put the pieces together on the remote-backup front. Nice article.

September 23, 04h

This is what I’m talking about! Great post!

15
GrumpySimon says:
September 23, 09h

As for backing up MySQL databases with rsync - you can just copy the mysql data directory directly. This is much faster than using mysqldump, but will ONLY work for MyISAM tables ( NOT InnoDB! ).

–Simon

Mathias says:
September 23, 09h

“But what about the more important data from the database?”

http://mathibus.com/archive/2005/07/mysql-backup

Harry says:
September 23, 10h

I don’t know enough about rsync, and you may have already considered this, but what about putting your remote directory in Subversion instead?

Dave S. says:
September 23, 10h

re: ssh keys

Thanks all for the suggestions, ssh keys have been on my to-do list for ages. Haven’t had much luck yet, but I’ll keep plugging away at it.

“what about putting your remote directory in Subversion instead?”

I just about added a line to the original article to this effect. svn or cvs would work perfectly well too. I’ve just found them harder to grasp (and higher maintenance) than rsync. Plus, when working on more than one site the rsync method just seems easier. But that’s just me.

Harry says:
September 23, 11h

Again, I don’t know if rsync does this, but the added benefit with version control is that you can revert your entire directory and all of its files to a previous state if the need arises. And the need has arisen for me so I’ve found it useful.

I know what you mean about the setup. I guess it depends on your host. I’m on TextDrive and setting up a Subversion repository was pretty simple. After setup, I read through the free Subversion book online to learn a few basic commands. After the initial sync, keeping in sync is just: “svn up”.

But, as I think you were implying, the important thing is do something that works for you, as long as you do something.

September 23, 12h

Thanks, Dave, and other commenters for the SSH tips. :)

My only tip to add to this would be: for finding out the remote path you’re backing up… to save having to traipse up the directory tree on your remote server using FTP, just SSH into it instead and run the ‘pwd’ (print workin’ directory) command.

x

Henrik says:
September 24, 02h

Hello :)

Why not use the site synchronize method in Dreamweaver?

It’s great :) One click (or perhaps two) and you’re done.

September 24, 06h

To Neil who was asking about skipping directories, I use unison for automated backups, unison is pretty much rsync with some stuff made easier.

http://www.cis.upenn.edu/~bcpierce/unison/

One of the things made easy, is that in your ~/unison/default.prf file (i.e. yoyour preferences file ) , you can set

ignore “/music”
ignore “/.trash”

or whatever.

I highly recommend unison for this problem specifically.

September 24, 10h

I don’t own a Mac so I don’t know for sure, but couldn’t you use Automator to run your shell script file periodically instead of having to remember to run it?

Just a thought.

September 27, 12h

Des, (and Neil), rsync can easily exclude defined files and/or directories using the –exclude flag. It is especially handy as you can use wildcards to omit certain files with a specific patterns in their filename. Here are simple examples of two common uses:

–exclude=somedir/
–exclude=*.txt

Trejkaz says:
September 28, 09h

Putting the remote directory in subversion would be a mess. You’d then have to script something on the remote end to check the whole thing in each day. :-)

For basic site stuff it’s fine and it’s good to version any actual modifications you make to a webapp’s code… but a lot of webapps do store some dynamic things in there which also need to be backed up.

Anthony says:
September 29, 10h

Wow. First time checking out your site and this is really great stuff even for someone only recently becoming involved with server based web design. Thanks for pointing out some useful knowledge, and keep doing what you do.

27
Ron says:
October 04, 02h

Great information! I am considering using this to transfer a complete site from one host to another so preserving permissions is critical as is keeping directory structure (I’ll be using Mac OSX unless I am forced to use Windows).

When it comes time to restore to the new host, what is the correct way to expand everything so that all directory structure and files/permissions are restored to the same state they were in the original site?

Thanks to all of you for some great info!

28
tom victory says:
November 02, 02h

1)what is the benefits of having a back up server? in details

2)what is the disadvantages of having the back up server? in details

seanmurph says:
December 21, 10h

“If everything is working properly, it will appear that nothing is happening”

Throw on a -v (verbose output) to stay more informed to what exactly rsync is doing…

30
Christopher says:
January 01, 07h

Just a note on another way to change local directories in the terminal if said directory is open in a window (or you find traversing through the GUI easier than cd/ls). Type cd [space] and simply drag the folder you wish to change to into the Terminal window. Great article - thanks.

31
anon says:
January 20, 23h

i had a problem: very large files (2Gb+) were causing the process to stall. but there’s a –max-size option, so rsync will ignore files over a certain size.