Backing up Databases with IBM Spectrum Protect
Database backups are a bit special as a database usually consists of a number of physical files that all need to be backed up as an entity, often with consistent time stamps. Databases also have transaction logs to ensure that the data stored in a database is consistent, even after a hardware failure. Databases also have internal catalogs which record these files, so when you do a restore, you need to make sure that the catalogs hold the correct information too. To help with this lot, databases have a Database Management System (DBMS), which tracks physical database files, transaction logs and backups. A DBMS will usually be able to run a backup while the database is active, which effectively means no backup window is required.
Snapshot backups are not really suitable either, as a DBMS will often hold updates in buffers for efficiency, but snapshots will not backup the data held in buffers. You need to use the DBMS utility for backups, as that flushes buffers and backups up all the parts of a database consistently.
SPFS - a filesystem for Spectrum Protect
What is SPFS?
Casandra, MongoDB. MariaDB, MySQL, PostgreSQL, Progress OpenEdge, SAP Adaptive Server Enterprise, SAP IQ Server, SAP SQL Anywhere, SQL Server Express. All of these databases have one thing in common, there is no backup agent to protect them with Spectrum Protect. However, 'SPFS' can protect any kind of database or application and this without any need of education. DBAs can use their existing DBMS skills to get a secure, offsite backup, without needing to learn how the backup software works.
SPFS works by mounting Spectrum Protect filespaces on your server, then backing up anything by copying data onto that mount point. All operations that go via this mount point are translated directly into Spectrum Protect Client API calls, so nothing is cached locally. You can use any of the Spectrum Protect features directly on your client, including de-duplication,compression,encryption and filtering.
The solution can be used to protect both commercial databases like oracle, DB2 and SQL server, as well as the opensource databases above. It is easy to integrate with Oracle and PostgreSQL to protect the transaction logs or archive logs directly when they are created.
How does SPFS work?
First you need to mount the file system as type spfs. In the example below we are calling the mounted directory '/backup'
# mount -t spfs /backup
Now simply run your backup using your standard backup command, pointing the output to a location in the /backup directory. For example for a mysql database, a PostgreSQL database and a mongoDB:
# mysqldump > /backup/mysqldump.out
# pg_dump > /backup/pg_dump.out
# mongodbdump --out /backup/mongodbdump.out
SPFS takes the data and passes it over to Spectrum Protect, using the SP API, where the data is stored using standard Spectrum Protect parameters.
Now, everyone knows that there is no point in taking backups if you can't do a restore. To run a restore you simply use the standard DBMS utility, picking the data from the /backup SPFS directoy like this
# mysql < /backup/mysqldump.out
OK, so how do you find out what backups are available for restore. You don't need complicated search commands or access to the Spectrum Protect database, you simply use the operating system file commands like ls and dir. For example:
# ls /backup
Database logs can be protected as they are created by simply making them write to the /backup directory. For example, with PostgreSQL, set up WAL management to use the archive_command, and select the SPFS mount point as a target.
For Oracle, write a second copy log direct to SPFS by adding a line in the init.ora file:
archive_log_dest_2 /spfs optional
This will backup the archive logs directly as they are created to the Spectrum Protect backup server.
SPFS is a WORM (Write Once Read Many) media filesystem. Once the data has been saved on the SPFS mountpoint, it can't be changed, so the backups are secure. Also, the data cannot be deleted from the client side, unless client deletion is granted by the backup administrator, by setting backdel=yes.
For extra security you can also encrypt the data using a private key. Data can be encrypted in 3 ways:
- DATA TRANSFER - All communications between the client and the backup server uses encrypted communication protocol.
- ENCRYPTED STORAGE - It is possible to enable encryption before the content is physically written to the media (tape, disk or whatever media is used in the back end storage attached to the backup server).
- CLIENT ENCRYPTION - It is possible to enable encryption for all or selective contents on the filesystem before it is being sent to the backup server.
Spectrum Protect Features
SPSF makes extensive use of Spectrum Protect features, so while a DBA can use it with virtually no training, it does need a Spectrum Protect administrator to set it up. It uses Spectrum Protect management classes to decide what sort of treatment the different kinds of backup data gets. These management classes are controlled with standard include\exclude files at the client, or client optionsets at the server. However, presenting the backups as a file system gives the administrator an opportunity to present this retention and storage classification in a unique way. For example, lets imagine you have the SPFS file system organised with the following sub directories:
Now you set up management classes so that
/backup/fast/* is stored on SSD
/backup/secret/* is encrypted
/backup/safe/* has extra copies on multiple storage pools, on different locations
/backup/short-retention/* has 2 days retention
/backup/long-retention/* has 365 days retention
Easy to use, and easy to set up. This is a quite simpilistic example of course. You can go much further than this using the power of TSM management classes, and so fine tune how long you want to keep backups for, where the data will be stored, how many copies of backups, where each copy will be held and more for different classes of backups.
You can use INCLUDE and EXCLUDE commands to select or deselect data, for example to say you don't want to backup certain types of file like *.mp3.
SPFS can also use Spectrum Protect data reduction techniques such as compression and de-duplication
Archive old Data off to Spectrum Protect Storage
The full syntax of the SPFS mount command is
mount -t spfs DATATYPE backup/archive /dirname
The default datatpe is BACKUP, so it was not shown in the backup example below. However if you mount the filespace with DATATYPE ARCHIVE, then this will invoke the Spectrum Protect archive utility. So, let's assume you mount an archive SPFS like this:
mount -t spfs DATATYPE ARCHIVE /archive
Now any data that you move onto that archive directory will be archived off onto whatever TSM storage is appropriate, disk, tape or cloud. This frees up space on your client, but the files are easily visible in that archive directory, and can be browsed or copied back over using standard cp or drag and drop commands. Unlike conventional products, there is no need for any stub files.
A quick overview of how Spectrum Protect manages older backups. Spectrum protect only backs up files if they have changed since the previous backup. All files are backed up on the first run, then changed files thereafter. Spectrum Protect usually keeps more than one backup copy of a file. The most recent backup of a file is called the 'active' backup and is retained forever (unless it is deleted manually). Older backups are called 'inactive' backups. When TSM create a new backup, then the previous active backup becomes inactive, and the new backup becomes the active one.
This raises two potential issues with SPFS:
Backup are presented as files. What happens if someone uses the 'rm' or 'rmdir' command and tries to delete all the backups?
If you want to restore a file back several days, how do you get to select an older backup?
If backdel=yes is set in the client node definition, then the client is allowed to delete backups, so the delete command will be honoured and backups deleted. This would typically be used for Oracle RMAN clients, but this would not normally be a good idea, so backdel=yes should be used with caution.
Otherwise, if the client is not authorised to delete backups and executes a delete command, this will mark the object as inactive in the backup system. The retention of the backups will then follow normal retention policy rules.
If there are multiple backup versions of a file, then all backup versions are presented on SPFS, but the file names of older backups are appended with a verson number. For example, suppose a client saves a file called 'important.txt' every morning. The backups will be presented on SPFS like this:
important.txt - this is the active, or most recent backup of 'important.txt', created today
important.txt(-1)- this is second most recent version, created yesterday
important.txt(-2)- this is third most recent version, created the day before yesterday
If you are familiar with z/OS mainframes, you will recognise that this is just how GDG files work.
What this means is that you can see all the backup versions of important.txt in the SPFS file system without having to run Spectrum protect 'query backup' commands with the 'inact' option, and just browse a backup to see what it contains.
The SPFS product is a multithreaded application that uses backend workers with a connection pool. This means that if there is a connection available in the connection pool that already has a working session on the Spectrum Protect Server, then that connection will be prioritised to be reused for new file operations.
The SPFS product also has a cache for metadata to avoid extra Spectrum Protect API calls to lookup metadata about files and directories.
There is an asynchronous data transfer queue for each worker which improves write and read performance.
The filesystem also has a readahead feature, that retrieves data from the Spectrum Protect Server even though the data has not yet been requested by the client.
For more information, check out the Spictera website