Learn how to back up and validate your database backups

In this post we”ll focus on backing up our application database. Most web sites or web applications will be backed by some form of database. It is important to know how to create, query and backup your database in case anything happens to your data and you need to restore it.

It’s also important to validate your database backups. If you experience significant database issues, you don’t want to find out at the last minute that your nightly backups have not been working correctly. In this post, we will first create database backups with Mysqldump and then test our database backups. We will then learn to use RDS Snapshots and also use AWS S3 for storing backups.

We will cover the following topics:

  • Database backups with Mysqldump
  • Testing our database backups
  • Using RDS Snapshots
  • Using AWS S3 for storing backups

Database backups with Mysqldump

While connecting to a database, you’ll need 4 details:

  • A host address
  • Database name
  • Username
  • Password

The host address will vary a lot depending on where your database is located. If you are backing up a database on your local machine, it could be ‘localhost’ or if you are backing up a database in an RDS instance, the host address could look something like this:

mydatabase.abc123.us-east-1.rds.amazonaws.com

If you would like to install a database locally to use some of the commands in this post, you can install MariaDB Server which is a popular relational database used in web applications by using the following command in your terminal:

sudo apt install mariadb-server

This will install mariadb server locally on your machine. By default, you will be able to log in to the database using the following command in your terminal:

Sudo mysql –u root

Once you have logged into the database, you can create a new database user that you can use in the examples in this post using the following commands:

CREATE USER ‘user’@’%’ IDENTIFIED BY ‘mypassword’;
GRANT ALL PRIVILEGES ON *.* TO ‘user’@’%’;
FLUSH PRIVILEGES;

The commands above will create a user called user with a password of mypassword. It will then provide the user with permissions to query the database and finally it will flush the privileges to make sure the user and permission changes have been applied.

For local testing and development, the above GRANT command provides all privileges to the new database user. In a production environment it is recommended to give reduced permissions to a database user that is used in a web application to help to avoid problems such as deleting data or dropping tables.

A more suitable GRANT command might look like:

GRANT SELECT, INSERT, UPDATE, ALTER, CREATE TABLE, INDEX ON *.* TO ‘user’@’%’;

Before we can begin to backup our data, we need to install a program called mysqldump first before we can use it.

On Ubuntu, you can install it using the command:

sudo apt install mysql-client

Once it is installed, we can start backing up a database using the following (Substituting in your database host address, database name, username and password):

mysqldump –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} –single-transaction > ${DATABASE}.sql

This command may take some time, depending on the size of your database and your connection to your database and it will create a file with the same name as your database with a .sql extension.

If you need to create a sample database with some sample data for the examples in this post, you can use the following:

Create a new file called create.sh with the following content:

#!/usr/bin/env bash
# The following assumes you have mysql or mariadb installed locally
# It will create a database and import the data from the seed file.

DATABASE_HOST=”localhost”
DATABASE_USERNAME=”user”
DATABASE_PASSWORD=”password”
DATABASE=”myapplication”

# create a local database called ‘validate’
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” -e “create database if not exists validate;”

# import the sample users table and data
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} < seed_data.sql

You will need to update the database variables at the beginning to suit your database host, username and password.

Next we will create a very small sample set of data. Create a new file called seed_data.sql with the following content

CREATE TABLE users ( id int unsigned not null auto_increment, name varchar(20) not null );
INSERT INTO users ( id, name ) VALUES ( null, ‘Jane Doe’ );
INSERT INTO users ( id, name ) VALUES ( null, ‘John Doe’ );

The above files will create a database called validate and will import a sample users table.

Backing up data on remote hosts

If you are backing up a database from RDS, the backup process will be much quicker when running from an EC2 instance in the same region. You can use a command line took called SCP (short for OpenSSH secure file copy) or similar once the backup has completed to copy the backup from the EC2 instance down on to your local machine.

Sometimes you might want to only backup a specific table within a database, or to skip a single table if it is too large. You can pass additional parameters to mysqldump when you are running it in your terminal to help with those situations.

To back up specific tables, in your terminal use the following command:

mysqldump –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} table1 table2 table3 –single-transaction > ${DATABASE}.sql

You will need to update the mentions of table1, table2, table3 to the names of tables in your database.

To skip a specific table, in your terminal use the following command:

mysqldump –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} –single-transaction –ignore-table=${sometable} > ${DATABASE}.sql

If your backup file is a large file, you could compress it using the following command:

gzip ${DATABASE}.sql

This should reduce the size of the file quite a bit and make it easier and faster to copy elsewhere.

Once we have a good process for backing up our database, an important next step is to test those database backups often. Your back up process would be no good to you or your business if one day you needed a backup in an emergency and found out too late that the backup process was not working or not backing up all of the data correctly.

In our next section, we will look at a way to retrieve our backup, restore it to a working database and test it to make sure it has performed its backup correctly.

Testing your database backups

An often-overlooked step is to test the backups you have created. Someday you’ll need one of your backup files and you won’t want to open it up to find an empty file!

One way to test your backup is to create a new database, import the backup file in to it and perform a query or two to make sure some recent data exists in the backup so that you know your recent back has worked.

To create a temporary database called temp_database use the following command:

mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” -e “CREATE DATABASE IF NOT EXISTS temp_database”;

Next, import your backup into the database. You’ll need to decompress it first if you compressed it using the following commands:

gzip -d backup.sql.gz
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” temp_database < backup.sql

This may take a moment or two to complete depending on the size of your backup

Once the database has been restored, the next step is to query one or more tables to make sure it has recent data. The table you query really depends on your own database and the information you store there.

One potential example is to query a table called ‘logins’, if you happen to have an application that makes a record of each time a user logs in.

Assuming the table has a date/time field of some kind, you could run a query to check to see if at least 1 record exists in the logins table from yesterday or some period of time that makes sense to you as shown in the following snippet:

mysql –host=$HOST} \
–database=${DATABASE} \
–user=${local_db_user} \
–password=${local_db_password} \
–skip-column-names
-e “select count(*) from logins where date > 2020-01-01” > logins_count.txt

This command will connect to the database and query the logins table to count the number of logins after 1st January 2020. It will save the result to a text file called logins_count.txt

The skip-column-names parameter will prevent the text file from containing the field name from the table, making it a little easier to confirm that the file contains only a number and that it is greater than 0, when performing a check with our code.

If the text file exists and has a count greater than 0 then your earlier backup appears to have backed up correctly.

In our next section, we will look at the back up facility built in to AWS’s Relational Database Service (RDS). We will look at how they differ from manual backups and how to enable them.

Using RDS Snapshots

AWS Relational Database Service (RDS) is one of many services provided by AWS. At a basic level RDS is a database service which can be used to create relational databases using a number of providers such as mySQL, MariaDB, Oralceor Aurora. Aurora is a mysql compatible relational database created by AWS.

RDS is tempting to use when you need a database for your application. It can handle database replication, backups and restores for you which can be difficult to develop and maintain if you are self-hosting a database on one or more EC2 instances.

RDS provides a backup functionality built in. This backup process allows you to choose a retention period of 0 to 35 days. This means RDS will back up your database each day and keep those backups for up to 35 days or whichever number of days you choose. If your RDS instance ever ran into difficulty, such as losing data due to an incorrect delete, you can ‘Revert’ to an earlier version of the database within the retention period. In this context, ‘Revert’ may be a little bit misleading for this process, as it will create a new RDS instance from the backup, leaving your original RDS instance in place.

You can also choose a time period when these backups are performed during a day. You can choose a start time, such as a quiet time overnight when your database isn’t too busy and make a period of time such as an hour available for performing the backup. While the backup is taking place, your RDS instance can operate as normal.

RDS Snapshots are visible in the Snapshots section of RDS. You have a number of available options when working with Snapshots including:

  • You can Restore the snapshot, meaning you can create a new RDS instance from the backup.
  • You can Copy the snapshot, meaning you can copy the snapshot to another AWS region, such as from us-east-1 to eu-west-1.
  • The Share Snapshot option allows you to make the snapshot Public if you choose to do so, or to make it available to another AWS Account ID.
  • You can also export your snapshot to S3. S3 is AWS’s storage service and the export option allows you to export all or some of the backup.

In our next section, we will look at a relatively new backup service provided within AWS called ‘AWS Backup’. AWS backup can help to back up a number of different items within AWS and also help to manage their lifecycle, meaning how the backup files are handled over time.

Using AWS Backup service

In 2019, AWS launched a new backup service called AWS Backup. This service allows you to automate some of the existing backup options that already exist within some other AWS services such as RDS.

AWS Backup starts by creating a Backup Plan. Within this plan you can define a schedule such as daily, weekly, monthly or a custom Cron expression for more fine grained control and you can define a backup window for when the backups should take place.

A Lifecycle section allows you to transfer backed up files to cold storage. This is a storage option that provides a lower cost to store your data which happens automatically after a defined period of time such as 1 week after the backup was created. You can take an additional step to Expire this backup after a period of time.

As an example of putting these lifecycle options together, an item can be backed up on the first day of the month and transitioned to cold storage the following day. At the end of the month or even the end of the year it could be deleted automatically.

You can create different backup Vaults as a way to use different Key Management Service (KMS) encryption keys for your backups. This can be useful if you work with different customers and want to backup and encrypt backups for that particular customer using their own dedicated KMS key.

Like many other AWS services you can add some Tags to your backup plan. This can be useful later for reporting or cost analysis purposes.

Once your Backup plan is created, you can then move on to the step of defining the items to back up. This section is called Resource Assignment. You can give your resource assignment a name and attach an Identity and Access Management (IAM) role which has the permission to perform the backup of the assigned item(s).

The final step is to assign the resources you would like to back up. You can assign resources based on Tagged items, or specific resource IDs such as RDS instances.

You can combine several items to back up. You could back up 1 RDS instance and 1 EC2 instance as part of 1 backup plan and have them both transition to cold storage and expire after a period of time.

Overall AWS Backup is a very useful service to use. You can spend some initial time setting up one or more backup plans and then leave it to work automatically behind the scenes with no additional work so that copies of your databases and EC2 instances are available if you ever need them.

Following on from our discussion in an earlier post IAC tools such as Terraform or Cloudformation can be used to create AWS backup plans, so you could create the plans while developing your infrastructure too.

A number of backup options would be useless without AWS S3, so in our next section we will look at S3 and the options it contains to help store and manage our back up files.

Using AWS S3 for storing backups

AWS S3 is an excellent resource for easy and accessible storage. You don’t need to allocate any storage in advance, it can grow along with your usage.

If you are developing a web application and users of your application can upload files such as logos or avatar images, storing those items directly to S3 is an ideal solution. When using s3 you need to create one or more “buckets” to start storing your files.

Bucket names need to be unique across all AWS customers, so common bucket names such as “files” are probably already taken. Instead, consider creating a bucket with a name such as {your application name}-files for a better chance of creating a relevant available bucket.

When creating or modifying an existing S3 bucket, you will have a number of options. AWS have added additional options and storage types to S3 over time, but the following are currently the most common and useful options:

  • Versioning – With Versioning enabled on an S3 bucket, any file that is uploaded to the bucket can be overwritten and earlier versions of the files will be retained. This can be used as a form of backup and versioning needs to be enabled for the feature called Replication.
  • Replication – Replication allows all or some items in a bucket to be immediately copied to another s3 bucket. When an object is copied, it can change to a different storage class. This means that an object can be placed in your S3 bucket and immediately copied to another s3 bucket with a different storage type of ‘Glacier’ for lower cost.
  • Lifecycles – Just like the AWS backup service described earlier, a lifecycle can be applied to an s3 bucket. A lifecycle rule can be used to change the storage type of objects or delete the file after a period of time.
  • Storage Types – S3 provides a number of different storage types. S3 Standard (S3 Standard), Amazon S3 Intelligent-Tiering (S3 Intelligent-Tiering), Amazon S3 Standard-Infrequent Access (S3 Standard-IA), Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA), Amazon S3 Glacier (S3 Glacier). Each tier has different advantages such as lower cost, though comes with a down side too such as reduced availability. If you are unsure, use the S3 standard storage option.

AWS provides an S3 FAQ here for any further questions you might have https://aws.amazon.com/s3/faqs/

Now that we know about S3 buckets and how to enable options such as a lifecycle, in our next section we will look at ways to get our backups uploaded to S3.

Backing up your files to S3

In an earlier post in a section called ‘Installing the AWS CLI‘, we described using the AWS CLI to create resources on AWS such as EC2 instances.

We can also use the AWS CLI to create and manage S3 buckets.

To create a new S3 bucket from your terminal, using the AWS CLI use the following command:

aws s3api create-bucket –bucket myapplication-backup-bucket

To list your S3 buckets you can use:

aws s3 ls

To list the contents of an S3 bucket use:

aws s3 ls s3://{your bucket name}

The above image shows the output from creating a bucket and querying its objects

To copy a file from your local machine to an S3 bucket use:

aws s3 cp file.txt s3://{my bucket name}/file.txt

In our next section we put some of our AWS CLI knowledge to good use. We will write a script to back up a database and upload the resulting back up file to AWS S3 for storage.

Database backup with Mysqldump and upload to S3

Putting the items we have learned above together, we can create a script that can connect to a database, perform a backup, compress the backup file and upload it to s3. Backup.sh

#!/usr/bin/env bash
# The following assumes you have an AWS account with the AWS CLI installed locally
# It will perform a mysqldump of a database, compress and upload the resulting file

DATABASE_HOST=”localhost”
DATABASE_USERNAME=”user”
DATABASE_PASSWORD=”password”
DATABASE=”myapplication”
S3_BUCKET=”myapplication-backup-bucket”

# dump the database to a local file
mysqldump –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} –single-transaction > ${DATABASE}.sql

# compress the sql file using gzip
gzip ${DATABASE}.sql

# upload the compressed database file to s3
aws s3 cp ${DATABASE}.sql.gz s3://${S3_BUCKET}/${DATABASE}.sql.gz

# list contents of the s3 bucket to confirm the upload
aws s3 ls s3://${S3_BUCKET}

# remove the local compressed database file
rm ${DATABASE}.sql.gz

Once your database is backed up it is important to test that the backup was successful. We will create two scripts to validate our backup. The first script will download the database backup and use it to create a local database. The second script will query the local database to make sure it has some data, to confirm it was backed up correctly.

In your editor, create a new file called prepare.sh with the following content:

#!/usr/bin/env bash
# The following assumes you have an AWS account with the AWS CLI installed locally
# It will download a backup file, create a database, import your data and perform a query that should have at least 1 resulting record.

DATABASE_HOST=”localhost”
DATABASE_USERNAME=”user”
DATABASE_PASSWORD=”password”
DATABASE=”myapplication”
S3_BUCKET=”myapplication-backup-bucket”

# download a backup file from s3
aws s3 cp s3://${S3_BUCKET}/${DATABASE}.sql.gz ${DATABASE}.sql.gz

# uncompress the backup file
gzip -d ${DATABASE}.sql.gz

# create a local database called ‘validate’
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” -e “create database if not exists validate;”

# import the backup in to the database
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” ${DATABASE} < ${DATABASE}.sql

The above script will connect to AWS S3 to download our backup file. It will decompress the file and create a local database using the content from the backup.

You may need to change the variables at the beginning of the file to match your database settings so the script can connect successfully.

Our next step is to query some of the data in the local database to verify that the data was backed up correctly.

In your editor, create a new file called validate.sh with the following content:

#!/usr/bin/env bash
# The following assumes you have a local database already created to query
# It will perform a query that should have at least 1 resulting record.

DATABASE_HOST=”localhost”
DATABASE_USERNAME=”user”
DATABASE_PASSWORD=”password”
DATABASE=”myapplication”

# query the data and store the result in to a file
mysql –host=${DATABASE_HOST} –database=${DATABASE_USERNAME} –user=${DATABASE_USERNAME} –password=${DATABASE_PASSWORD} –database=${DATABASE} –skip-column-names -e “select count(*) from users” > users_count.txt

# count the number of records in the file created by the previous query
record_count=$(cat users_count.txt)

# if there are more than 0 records, the backup file is OK
if [ ${record_count} -gt 0 ]; then
echo “Database backed up successfully”
else
echo “No records, the database didn’t back up properly”
fi

# clean up local files
rm users_count.txt
rm myapplication.sql

# drop the database
mysql –host=${DATABASE_HOST} –user=${DATABASE_USERNAME} –password=”${DATABASE_PASSWORD}” -e “drop database if exists validate;”

Now that we have created our files, we need to run them and confirm the results by following these steps:

In your terminal run the following command to run the first script:

bash prepare.sh

The script will download the backup file from S3 and create a local database ready to query.

Next run the second script to query the data:

bash validate.sh

If your data backed up correctly and there is data in your users table, then you should see an output of:

Database backed up successfully.

If there is no data in the table, or some other issue with your database then you will see an output of:

No records, the database didn’t back up properly.

If you receive this error, refollow the steps from earlier in this post to make sure you have backed up your data correctly so that the validation process will run successfully.

Database backup and upload to S3 with Ansible

To use ansible to back up our database and to upload the file to s3, we will create 2 small Ansible roles:

  • The first role will connect to the database and dump it to a local file and compress it using gzip.
  • The second role will create an s3 bucket and upload the compressed backup file.

In the following steps we will create the files necessary for each role. Before you begin, make sure you have an AWS Access Key and AWS secret key ready from the IAM section of your AWS Account. 1. The first step is to create an Ansible Vault file to store AWS credentials for Ansible to have permission to create an s3 bucket and upload a file to it:

ansible-vault create group_vars/all/pass.yml

Once the file is opened, add the following content to the file:

ec2_access_key: [your aws access key ]
ec2_secret_key: [your aws secret key]

You will be asked to provide a password. Remember this password as you will need it any time you run the playbook.

Next, create backup.yml with the following content, you can change the database variables to suit your own database details:


– hosts: localhost
connection: local
gather_facts: no

vars:
database: myapplication
database_username: user
database_password: password
database_host: localhost
name: Backup a database

roles:
– backup

The next file we will create is the backup role. Create a file at the location roles/backup/tasks/main.yml with the following content:

– name: Dump database
mysql_db:
state: dump
login_host: “{{ database_host }}”
login_user: “{{ database_username }}”
login_password: “{{ database_password }}”

name:
– “{{ database }}”

target: “{{ database }}.sql”
– name: Compress database backup

archive:
path: “{{ database }}.sql”
dest: “{{ database }}.sql.gz”
format: gz

For the s3 playbook, create the following file s3.yml with the following content:


– hosts: localhost
connection: local
gather_facts: no

vars:
bucket: myapplication-backup-bucket-ansible
file: myapplication.sql.gz
name: Backup and upload to s3

roles:
– s3

Create the s3 role file at roles/s3/tasks/main.yml with the following content:

– name: Create an s3 bucket

aws_s3:
bucket: “{{ bucket }}”
mode: create
– name: Upload backup file to s3
aws_s3:
bucket: “{{ bucket }}”
object: “{{ file }}”
src: “{{ file }}”
mode: put

Finally, create one last file to include a list of dependencies that Ansible will need to enable it to perform the dump of the database, requirements.txt

pymysql

Use the following command to make sure the dependencies are installed for Ansible to work:

pip install -r requirements

Next run the backup playbook to backup the database and compress the file:

ansible-playbook s3.yml –ask-vault-pass

Then run the s3 playbook to create an s3 bucket and upload the database backup to s3:

ansible-playbook backup.yml –ask-vault-pass

The result is that Ansible will back up your database, compress the output file, create an s3 bucket and upload the compressed backup file to your new s3 bucket.

The source code for using Ansible to back up and upload the resulting file to S3 is available here https://github.com/gordonmurray/mysqldump_s3_ansible

In this post we covered the importance of backing up a database as well as validating that the backup was successful, by querying a table for recent data. We learned about the Snapshots option that can be used with AWS Relational Database Service (RDS) and how AWS S3 is an ideal location to store backed up data We worked to combine the steps that we have learned in to a number of useful scripts to help automate the process of backing up and validating a database. We looked at the AWS Backup server to help back up Tagged items in our AWS account so that they can be backed up automatically. To expand on our existing knowledge of Ansible from earlier posts we also learned how to perform backups using an Ansible playbook.

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *