Backup files to Amazon S3


After a few years of silence today it came to my mind that I should once again start to write on my blog. During the last 12 odd months I have been involved in developing and revamping several of the websites own by the company I currently work.

In this article I’m going to discuss about the steps I have been using to automate the backing up of the websites that were hosted with Amazon EC2 to Amazon S3 bucket.

The strategy I adapted can be broken down into following 3 steps:

  1. Create a backup copy of each of the database and the website source code (all my websites were developed using PHP) on a daily basis and compress each (database & source code) using tar.gz compression appending the timestamp
  2. Pushing of backup files to Amazon S3 bucket
  3. Set a conjob task to execute to process

Step 1: Create a copy of each of the database and the website source code

To achieve this I created a folder called backups (/home/ubuntu/backups) in the home directory and added the necessary instructions into the shell script as follows.

#!/bin/sh

# (1) set up the required variables
DB_DUMP=<filename>_`date +"_%Y_%m_%d"`.sql
SOURCE_CODE=<filename>_`date +"_%Y_%m_%d"`.tar.gz
DBSERVER=<hostname>
DATABASE=<database name>
USER=<database user>
PASS=<database password>

# (2) use the following command  to create a dump of the database
cd /home/ubuntu/backups/
mysqldump --opt --user=${USER} --password=${PASS} -h ${DBSERVER} ${DATABASE} > ${DB_DUMP}

# (3) compress the mysql database dump using tar.gz compression
tar -zcf ${DB_DUMP}.tar.gz ${DB_DUMP}

# (4) create a copy of the website source, compress it and moved to /home/ubuntu/backups/
cd /var/www/
tar -zcf ${SOURCE_CODE}  <website source code folder>/
mv ${SOURCE_CODE} /home/ubuntu/backups/

# (5) delete the older copies of backups which are more than 3 days old inside /home/ubuntu/backups/
cd /home/ubuntu/backups/
find <filename>_* -mtime +3 -exec rm {} \;

Save the file as backup.sh inside /home/ubuntu/backups

Step 2: Pushing of backup files to Amazon S3 bucket

To achieve this I adapted two approaches and you’ll find that the latter approach is easier. Initially I adapted an approach of using the Amazon AWS’s SDK to move the backup files to Amazon S3 bucket. This approach had an limitation when individual file size (After the initial compression the backup was over 12 GB) exceeded more than 4GB while on a 64 bit architecture Linux box (I used Ubuntu 16.04) since I used PHP. To overcome this I sliced the final output of the compressed file in to multiples of  3.6 GB.

tar czf - / | split -b 3850 MB - ${SOURCE_CODE}.tar.gz.

Approach 1: Using Amazon AWS SDK

Download the appropriate Amazon AWS SDK from here.  In my case I used the PHP SDK using the instructions available here and downloaded the PHP library using the 3rd steps (Installing via Zip file).

<?php
require_once('/home/ubuntu/aws/aws-autoloader.php');
use Aws\S3\S3Client;
use Aws\S3\Exception\S3Exception;

$bucket = '<bucket name>';
$pathToFile = '/home/ubuntu/backups/';
$fileNameSourceCode = ['<filename>_'.date('Y_m_d').'.tar.gz']; // name of the website source code, it should be equal to name of SOURCE_CODE variable found on /home/ubuntu/backups/backup.sh
$fileNameDBDump = '<filename>_'.date('Y_m_d').'.sql.tar.gz';// name of the database dump file, it should be equal to the name of DB_DUMP variable found on /home/ubuntu/backups/backup.sh

$credentials = new Aws\Credentials\Credentials(”, ”);

// Instantiate the client.
$s3 = S3Client::factory([
‘region’ => ‘us-east-1’,  // Since I have create the buckets in US East region (N. Virginia)
‘version’ => ‘2006-03-01’, // Standard version number for the S3 bucket service
‘credentials’ => $credentials
]);

//Pushing the source code file to the Amazon S3 bucket

if(count($fileNameSourceCode) > 0) {
foreach($fileNameSourceCode as $file) {
if(file_exists($pathToFile.$file)) {
try {
// Upload data.
$result = $s3->putObject(array(
‘Bucket’ => $bucket,
‘Key’ => $file,
‘SourceFile’ => $pathToFile.$file,
‘ACL’ => ‘public-read’,
‘Expires’ => gmdate(“D, d M Y H:i:s T”, strtotime(“+15 days”)) //This parameter doesn’t get applied, this we have to set on the bucket from the Amazon S3 account
));

// Print the URL to the object.
echo $result[‘ObjectURL’] . “\n”;
} catch (S3Exception $e) {
echo $e->getMessage() . “\n”;
}
}
}
}

//Pushing the database dump file to the Amazon S3 bucket

if(file_exists($pathToFile.$fileNameDBDump)) {
try {
// Upload data.
$result = $s3->putObject(array(
‘Bucket’ => $bucket,
‘Key’ => $fileNameDBDump,
‘SourceFile’ => $pathToFile.$fileNameDBDump,
‘ACL’ => ‘public-read’,
‘Expires’ => gmdate(“D, d M Y H:i:s T”, strtotime(“+15 days”)) ////This parameter doesn’t get applied, this we have to set on the bucket from the Amazon S3 account
));

// Print the URL to the object.
echo $result[‘ObjectURL’] . “\n”;
} catch (S3Exception $e) {
echo $e->getMessage() . “\n”;
}
}
Save the file as upload_to_s3bucket.php inside /home/ubuntu/backups

Approach 2: Using Amazon S3Tools

The Amazon S3 Tools is a very easy to use command line utility which can be used to push very huge files to Amazon S3 bucket with minimum effort. For Linux & Mac we can use s3cmd while for Windows use S3Express. I found this article on TecAdmin which has comprehensively explained it usage. I followed the following steps to set it up on my server.

  • Setting up of S3tool on the server

Installation

$ sudo apt-get install s3cmd

Configuration

You need to provide the Access Key ID and Secrete Key available with your Amazon AWS account during the configuration by executing the following command. As a best practice it recommends to create an IAM user and provide that creadentials instead of using the root account details.

# s3cmd --configure

  • Setting up the shell script to push the files to S3 Bucket

To achieve this I created a folder called backups (/home/ubuntu/backups) in the home directory and added the necessary instructions into the shell script as follows.


#!/bin/bash

_DB_DUMP=<filename>_`date +"_%Y_%m_%d"`.sql  # name of the website source code, it should be equal to the name of DB_DUMP variable found on /home/ubuntu/backups/backup.sh
_SOURCE_CODE=<filename>_`date +"_%Y_%m_%d"`.tar.gz  # name of the website source code, it should be equal to name of SOURCE_CODE variable found on /home/ubuntu/backups/backup.sh

s3cmd put ${_DB_DUMP} s3://<bucket name>/
s3cmd put ${_SOURCE_CODE} s3://<bucket name>/

Save the file as upload_to_s3bucket.sh inside /home/ubuntu/backups

Step 3: Set a conjob task to execute to process

Now lets set the cronjob task to daily or any required time interval to execute the following two scripts.

Firstly lets make the two shell scripts executable using following command

$ chmod +x /home/ubuntu/backups/backup.sh
$ chmod +x /home/ubuntu/backups/upload_to_s3bucket.sh

Open up the terminal and execute the following command
sudo crontab -e

Enter the following two lines and save.

30 01 * * * /home/ubuntu/backups/backup.sh #set to run the backup 30 minutes passing 1 o'clock in the morning

#use this if used the Amazon AWS SDK approach
00 03 * * * php /home/ubuntu/backups/upload_to_s3bucket.php #set to run the backup daily 3 o'clock in the morning

#use this if used the Amazon S3tools approach

00 03 * * * /home/ubuntu/backups/upload_to_s3bucket.sh #set to run the backup daily 3 o'clock in the morning