Requirements for an HSM product

There are a lot of Open Systems products out there that have HSM functionality. What sort of functions would you expect them to provide? Here is a list of a few.

Automatic migration

Migration should involve selecting files on expensive storage, replacing them with small 'stub files' used as place holders, and moving the data to cheaper storage in the storage hierarchy. Data should be readily accessible, no matter where it is stored within the hierarchy.

With the advent of automatic storage tiering and as hardware prices continue to fall, a simple HSM system that just moves data between storage devices does probably not add enough value to make it worth while. Now, an HSM product needs to be policy based and used to manage the retention of data as well as its location. What I mean by this is that the product must have the facility to tag classes of data so that they can be automatically retained or deleted depending on business and regulation requirements.

You can base migration on policy, thresholds, or a combination of the two.
HSM will migrate data when the amount of data on a disk exceeds a preset 'high threshold', expressed as a percentage of total disk capacity. HSM should stop the migration procedure when the specified 'low threshold' on the client has been reached.
Policy based migration is more difficult, as it relies on you identifying and classifying all of your data, either by filename or by 'metadata' and using that to decide when a file should be migrated or deleted. You would set up different policies, depending on how important the data is.

Automatic recall of files

All migrated files should appear to the user and applications to still reside on the primary disk. When a migrated file is accessed, the storage management solution should automatically and transparently move it directly from the secondary storage back to the primary disk.

System utilities

All backup, disk monitoring, and virus scanning tools should ignore archived files. They should not try to open the files, and they should not reset the 'last accessed date', which is used to determine if a file is eligible for migration or not.
The 'advanced text search' facility in Windows is a particular issue, as it scans every file in a directory looking for a word or phrase. Migrated files should be tagged in such a way that the advanced text search does not attempt to open them.
Some products offer alternatives to the Windows explorer search, such as indexing or search solutions which maintain a database of keywords from the files in specified folders on a server, and an associated search engine. This searches the database instead of the documents themselves, and so does not recall migrated files until the correct file is located. The benefits are twofold: (1) it runs significantly faster and (2) doesn't affect migrated or compressed files

User Interface

A user application will typically 'hang' until a migrated file is successfully recalled to disk. If a file is migrated to cheap disk, you would expect the recall to take a few seconds, so this is not an issue, However, it a file is migrated to tape and several recalls are queued for the same tape or tape drives are not available, a recall could take a few minutes. The HSM system should provide the following facilities.

HSM should tell the user if a file has been migrated, either by changing the icon, or by adding an extension to the file name.
Disk recalls should be transparent, as 'recall' messages would probably just annoy the users.
When recalling from tape, the user should be given a message that the file has been migrated, and given the option to continue with the recall, or cancel.
One simple way to achieve this is to provide a recall timeout parameter, say 10 seconds, and if the recall takes more than 10 seconds, give the user a message, tell him what's going on, and ask him if he wants to continue.

Stub file management

Backup utilities should be able to backup and recover setup files without recalling the file. It should be possible to move stub files to a different directory without needing to recall the file. This is a problem, as file names must include the full path to make them unique. For example, I can have a file called hsm.doc in three different directories. If all three are migrated, the only way to ensure you recall the correct one is to include the full path name when migrating them. If you move a stub file to a different directory, then its unique name has changed, and recalls that use the path of the stub to locate the migrated file will fail. The solutions to this issue involve replacing the path name to the file with some sort of unique catalog key, which becomes part of the file metadata, or including the full path name to the archived data inside the stub file.

Integration with backup and archive functions

Migration is not a replacement for data backup and archive. The purpose of backup and archive is to enable recovery of lost or inaccessible data and retrieval of point-in-time stored data. The purpose of HSM is to free space on higher cost, higher performance storage. Ideally, your HSM and backup software should work together. A storage management solution should have the ability, if desired by the user, to verify that backup copies of data exist before data can be migrated. This helps ensure data protection is not overlooked. In addition, a storage management solution should reduce network traffic and backup time as much as possible. When backing up or archiving a file that has been migrated, the storage management solution should copy that file directly from the server migration storage to the server backup storage. The file should not be transferred across the network again.

Read without recall

It is sometimes possible for a user to access needed information without recalling the file. An stub file can store control information and 'n' bytes of data. When a file is accessed and the information is contained within the stub file, the file is not recalled. Some products also allow you to change the recall mode for a migrated file to read-without-recall. This is especially useful when large migrated files contain reference information. The HSM product reads information from a migrated file sequentially and caches it into a memory buffer on the user's workstation or file server. This eliminates the need to store the file on the local file system, reducing network traffic, access time, need to find free space on the local hard drive, and processing for the HSM server.

Migrate-on-close

This mode is a variation of the second read-without-recall option above. The HSM product recalls a migrated file back to its originating file system and it remains there only while it is open. If the file has not been modified when it is closed, HSM replaces it with a stub file. HSM does not need to send a copy of the file to the HSM server storage again because the file has not been modified and the copy that currently resides in HSM server storage is still valid. This reduces network traffic and processing for the HSM server. This is also a solution to the 'run away search' issue. A Windows text search will open each file in turn, scan it for your selected text, then close the file. Migrate on close means that a text search will only have one file recalled at a time.

back to top