IBM Spectrum Protect Data Archiving
The Archive Command
Data archiving is intended to preserve a copy of a related set of files as they stood at a point in time for legal or compliance purposes. This set of files might consist of tax records, end-of-project reports or similar.
An Archive will typically be requested by the customer as a one-off process, it would not normally be a scheduled event. An Archive will usually be retained for several years.
If an Archive is required again, then normally the entire set of files is brought back to a new location, and the process is called 'Retrieve'.
An Archive is not the same as Backup, which typically involves copying an unrelated set of changed files to tape and retaining them for a relatively short time. Also a Backup does not affect the source data, whereas an Archive can delete the source data. An Archive is not the same as HSM migration, which involves moving older files off primary disk to cheaper storage. Migration is about managing disk space, Archive is about retaining data.
The Spectrum Protect ARCHIVE command has the following options, which will only work if the files pick up a suitable management class with an archive copy group defined.
- ARCHIVE pathname.filename
- Will simply archive a file
- ARCHIVE pathname/* -deletefiles -subdir=yes
- Will archive files in a directory including subdirectories then delete them from source
- ARCHIVE -filelist=textfilename
- You create a list of files to be archived and put them in a text file. The Archive command reads this list and archives the data off. The file list must include the full path name, so this can be used to archive selected files from different paths
- ARCHIVE -filelist=textfilename -archmc=mc7yrs
-deletefiles -description="End of year data for inland revenue, requested by Colin Green"
- Archive a list of files, bind the archive to a seven year management class (that you have previously set up with a seven year retention), delete the originals and give it a meaningful description (254 chars max)
- ARCHIVE pathname.filename -v2Archive
- Use the v2archive option to generate secondary description tables - see the performance section below
It is possible to archive data using the Web client GUI, but you get fewer options than with the command line.
The Retrieve Process
The following command line options are available to retrieve the data. The different options can be used in combination
- RETRIEVE pathname.filename
- Will simply retrieve an archived file to the original location. You will be prompted if the data already exists
- RETRIEVE pathname/* newpathname/
- Retrieve all files in a directory to a new location
- RETRIEVE pathname/* -pick
- Get a pick list of archived files from a specified directory. You can then select those files you want retrieved.
- RETRIEVE -filelist=textfilename
- Retrieve a list of files that are specified in a file
It is also possible to retrieve files with the GUI as shown below
Finding Archived files
To find individual archive runs you could use an SQL query
select NODE_NAME,ARCHIVE_DATE,CLASS_NAME,DESCRIPTION from ARCHIVES where NODE_NAME='node'
One easy way to find archived files is to use the Spectrum Protect Web Client GUI as shown above. Point your browser to http://servername:1581 and select the retrieve option. This will list all archives for that server, and you can drill down into each archive to find individual files.
If you need to provide a list of archived files, then from the command line you can use the QUERY ARCHIVE command and pipe the output into a file for perusal by a user.
Archive Performance Improvements
The GUI and Web clients use the archive description as the primary way to navigate to a specific archive but as these descriptions are in text format and were held in the same primary archive table as the archived file path names the search can take a long time. To speed up search performance some of these search items are also held in secondary description tables. Archives that are invoked from the Web Client or GUI always use the secondary tables. Command line archives can be forced to use the secondary tables if they are 'converted' by using a CONVERT ARCHIVE command.
Use this command convert archives run from the command line that should be forced to store filespace and description data in secondary tables to speed up searches. This is just appropriate if you run repeated archives over the same set of data, giving each archive a different description. If you use the command line for archives and recalls, and do not use the description to identify archives; then do not convert the archives to save on database size, and use the -v2Archive option with subsequent archive requests. Syntax:
CONVERT ARCHIVE nodename
Use this command to save on database space if your database has large numbers of archive entries, where large means 100,000 or more. This command should not be used if anyone uses, or may use the Web Client or GUI to work with archived files.
UPDate ARCHIve node_name -SHOWstats -RESETDescriptions -DELETEDirs
- statistics include the number of directory and file entries, the number of entries for directories with the same path specification but different descriptions, and whether the node is converted.
- resets the description field to the same description for all archive entries for a node. This means that every archive for a given directory will belong to the same package. Once the descriptions are changed, they cannot be restored.
- deletes all archive directory entries for the node. This means that the original access permissions cannot be provided when files are retrieved. This might not be as important as saving the database space. Once the directory entries are removed, they cannot be restored.
This command empties out the secondary description tables. It does not lose the archive directory or file data as that is held in the primary archive table entries. You can either use this command on its own to free up the database space used by the secondary description tables, or you can follow it with the CONVERT ARCHIVE command to audit and refresh the secondary description tables. The syntax is
UNDO ARCHConversion node_name
Backing up Archived files
Spectrum Protect will always try to keep a backup of both a stub file and its associated migrated file. In some circumstances, this can mean that you backup a lot more data than you expected, as Spectrum Protect maintains its position after file changes. You can force TSM to ignore migrated files with the skipmigrated option option. The default for this is 'no', but if you set it to 'yes' then Spectrum Protect will not backup migrated files.
When the skipmigrated option is set to 'no', another parameter comes into play, checkreparsecontent.
If you set checkreparsecontent=yes, Spectrum Protect compares the content of the local stub file with the content in Spectrum Protect storage. If they are the same, the stub file is not backed up again but it will be if they are different.
If you set checkreparsecontent=no, Spectrum Protect will not do any stub file comparison and will not back it up if it has changed. This could mean that you do not have a valid stub file backup, but if you need to do a restore you should be able to recover the complete migrated file.
The content of stub files changed with HSM for Windows 6.2 so if you upgrade, you need to redo the backup of the stubs. You would also need to refresh the stubfile backup if you move migrated files with the dsmmove.exe command or you changed the file space that is used for migration. In these cases, you should set checkreparsecontent=yes and skipmigrated=no for the next backup, but consider changing them back once the stub backups are refreshed.
As stated earlier, Spectrum Protect always wants a complete backup copy of any migrated file. If no backup exists, Spectrum Protect will temporarily recall the migrated file and back it up. You can set this process to write to a temporary directory file to prevent intefering with the stubs with the TSM client option stagingdirectory.
If the backup-archive client cannot create a complete backup copy of the migrated file, the backup-archive client does not back up the stub file. For example, if the stub is an orphan with no migrated copy in Tivoli Storage Manager storage, the stub is not backed up.
back to top