How to use TSM for long term data archives.
The Archive Command
Data archiving is intended to preserve a copy of a related set of files as they stood at a point in time for legal or compliance purposes. This set of files might consist of tax records, end-of-project reports or similar.
An Archive will typically be requested by the customer as a one-off process, it would not normally be a scheduled event. An Archive will usually be retained for several years.
If an Archive is required again, then normally the entire set of files is brought back to a new location, and the process is called 'Retrieve'.
An Archive is not the same as Backup, which typically involves copying an unrelated set of changed files to tape and retaining them for a relatively short time. Also a Backup does not affect the source data, whereas an Archive can delete the source data. An Archive is not the same as HSM migration, which involves moving older files off primary disk to cheaper storage. Migration is about managing disk space, Archive is about retaining data.
The TSM ARCHIVE command has the following options
- ARCHIVE pathname.filename
- Will simply archive a file
- ARCHIVE pathname/* -deletefiles -subdir=yes
- Will archive files in a directory including subdirectories then delete them from source
- ARCHIVE -filelist=textfilename
- You create a list of files to be archived and put them in a text file. The Archive command reads this list and archives the data off. The file list must include the full path name, so this can be used to archive selected files from different paths
- ARCHIVE -filelist=textfilename -archmc=mc7yrs -deletefiles -description="End of year data for inland revenue, requested by Colin Green"
- Archive a list of files, bind the archive to a seven year management class (that you have previously set up with a seven year retention), delete the originals and give it a meaningful description (254 chars max)
- ARCHIVE pathname.filename -v2Archive
- Use the v2archive option to generate secondary description tables - see the performance section below
It is possible to archive data using the Web client GUI, but you get fewer options than with the command line.
The Retrieve Process
The following command line options are available to retrieve the data. The different options can be used in combination
- RETRIEVE pathname.filename
- Will simply retrieve an archived file to the original location. You will be prompted if the data already exists
- RETRIEVE pathname/* newpathname/
- Retrieve all files in a directory to a new location
- RETRIEVE pathname/* -pick
- Get a pick list of archived files from a specified directory. You can then select those files you want retrieved.
- RETRIEVE -filelist=textfilename
- Retrieve a list of files that are specified in a file
It is also possible to retrieve files with the GUI as shown below
Finding Archived files
To find individual archive runs you could use an SQL query
select NODE_NAME,ARCHIVE_DATE,CLASS_NAME,DESCRIPTION from ARCHIVES where NODE_NAME='node'
One easy way to find archived files is to use the TSM Web Client GUI as shown above. Point your browser to http://servername:1581 and select the retrieve option. This will list all archives for that server, and you can drill down into each archive to find individual files.
If you need to provide a list of archived files, then from the command line you can use the QUERY ARCHIVE command and pipe the output into a file for perusal by a user.
Archive Performance Improvements
Tivoli introduced a couple of new archive commands in the April '06 update to TSM5.2 and 5.3. These are really just intended for sites that run thousands of archive operations and as a result have storage and performance issues.
The GUI and Web clients use the archive description as the primary way to navigate to a specific archive but as these descriptions are in text format and were held in the same primary archive table as the archived file path names the search can take a long time. To speed up search performance some of these search items are also held in secondary description tables. Archives that are invoked from the Web Client or GUI always use the secondary tables. Command line archives can be forced to use the secondary tables if they are 'converted' by using a CONVERT ARCHIVE command.
Use this command convert archives run from the command line that should be forced to store filespace and description data in secondary tables to speed up searches. This is just appropriate if you run repeated archives over the same set of data, giving each archive a different description. If you use the command line for archives and recalls, and do not use the description to identify archives; then do not convert the archives to save on database size, and use the -v2Archive option with subsequent archive requests. Syntax:
CONVERT ARCHIVE nodename
Use this command to save on database space if your database has large numbers of archive entries, where large means 100,000 or more. This command should not be used if anyone uses, or may use the Web Client or GUI to work with archived files.
UPDate ARCHIve node_name -SHOWstats -RESETDescriptions -DELETEDirs
- statistics include the number of directory and file entries, the number of entries for directories with the same path specification but different descriptions, and whether the node is converted.
- resets the description field to the same description for all archive entries for a node. This means that every archive for a given directory will belong to the same package. Once the descriptions are changed, they cannot be restored.
- deletes all archive directory entries for the node. This means that the original access permissions cannot be provided when files are retrieved. This might not be as important as saving the database space. Once the directory entries are removed, they cannot be restored.
This command empties out the secondary description tables. It does not lose the archive directory or file data as that is held in the primary archive table entries. You can either use this command on its own to free up the database space used by the secondary description tables, or you can follow it with the CONVERT ARCHIVE command to audit and refresh the secondary description tables. The syntax is
UNDO ARCHConversion node_name