- Linear Technology
- Helical Scan
- Tape drive comparisons
- Tape Futures
- Tape Error Handling
As discussed in the previous page, there are two fundamental types of virtual tape; Virtual Tape Augmented (VTA), where data is written into a disk cache and then consolidated onto physical tape, and Virtual Tape Elimination (VTE), where data is written to disk and stored there until it is deleted.
The basic principles of VTA are illustrated in the gif below.
Virtual tape vendors use different names for their virtual tape components, but the components are all basically the same, though they are implemented in a different way.
The diagram shows 16 virtual drives which are software emulated. The operating system sees these as real drives, mounts virtual tapes on them and writes data out as virtual tapes. The data is not written directly to tape, but is staged, usually in compressed format, to a disk cache. When there is enough data in the cache, the vtape system copies the data from cache to a real tape drive, onto high capacity real tapes. It either frees up the space in the cache, or marks it as available for use when required, depending on the implementation and your settings. Its usually best practice to keep the disk cache full, and reuse space as required. That means that if you require to read a tape which was written recently, it may be already in the cache
The principle behind VTE follows the same initial process, but the data remains on disk and you need a lot more disk space of course..
While VTA uses deduplication, deduplication is essential for VTE to lower the disk space requirements. Deduplication works by checking large sequences of bytes and looking for duplicates. This byte sequence can be by entire files, by a fixed chunk blocksize or by a variable length sliding window. The last example is the most efficient at finding duplicates, but it also uses the most CPU. Deduplication can happen at the client, so that less data is transmitted to the backup server, or all the data could be transmitted and stored on the backup server, then it is deduplicated later to cut down on disk usage. These are called pre and post deduplication. Vendors claim that deduplication can reduce storage use by 90% or more. Post-deduplication can be a potential problem as the backup data might not be available while deduplication is running, whiich in turn could delay recoveries.
The individual components, and some tips on how to use them, are described below.
Virtual tape drives are defined to the operating system just like ordinary drives. From the viewpoint of the host operating system, they look just like real drives, and will accept all the commands, and return all the conditions that a normal drive would.
Virtual tapes need to be 'initialised' or given a logical label to identify them just like real tapes. The tape labels are held in a VTS catalog, so a virtual tape does not need to be recalled from physical tape to scratch it. All the data required for label checking exists in the Vtape catalog.
On an IBM mainframe, data is directed to an IBM VTS by DFSMS constructs and this can cause problems with foreign tapes. Suppose you define a virtual tape range from V00000 to V49999. Your favourite supplier then sends you a fix tape with a volser of V23456 and you submit a job to read it. DFSMS will intercept your request, divert the allocation to a virtual tape drive and mount its own virtual tape, not your real physical tape. This will happen even if you code in a specific UNIT=xxxx override to try to force the allocation to a specific non-virtual drive. However, nil desperandum, you can force your allocation, but you need to use a special DFSMS storage class, STORCLAS=DUPT@SMS
//INPUT DD DISP=SHR,DSN=CAI.SAMPJCL,
On a VTA system, the disk cache is a buffer which holds the virtual tapes, at least until they are written out to physical tapes. A virtual tape is usually held in the cache even after it has been written to a physical tape, then if it is required for read again, there is no need for a physical tape mount. On an IBM VTS, the disk cache is best kept full, with older, copied virtual tapes overwritten by new ones, as space is required. An Oracle/STK VSM will issue warning messages once it hits a capacity warning threshold. At this point, automated space reclamation should kick in, but don't ignore the warnings. THe VSM, and an EMC Data Domain, both need some free space to run reclamation. If they do fil up they will grind to halt and you will need to delete any empty tapes that may exist on the system, then expire enough data to get the system below 80% capacity.
The EMC Data Domain is a VTE system so all the data is stored in a disk cache. If you use LAN free backups, then it is advisable to have this disk cache SAN attached, then the data can flow direct from the server disk to the Data Domain disk over the SAN.
Physical tape drives are usually high capacity IBM TS1140 a SUN T10000C or D or an LTO-6 or 7. The Physical Tape drives are not defined to the operating system, they are only known to the Vtape system.
Physical tapes cannot be ejected from the virtual tape library until all the data has been removed from them (IBM). If someone asks you to eject a virtual tape, send them an empty envelope (yes, it does happen). Seriously, there will be occasions when you might want to send data offsite, so you need to work out some way to do this, maybe by keeping a couple of stand alone physical drives.
Initiators are used by Data Domain. An initiator is an HBA port that is attached to a backup client. The VTL side of the connection is called the target port, and the client side the initiator port. The FC initiator port must be dedicated to only Data Domain VTL devices, and once it is zoned in correctly at the SAN, the WWPN of the Client will be visible at the Data Domain VTL. You can rename or alias the initiators, and it is best to define a set of good naming standards to relate these back to the individual clients, as otherwise you need to work out which client relates to a WWPN.
The controlling software looks after the disk cache, and also cleans up deleted data from real tapes.
Reconciliation: checks for invalidated volumes, i.e. virtual volumes which have been re-written. Reconciliation is best run before Reclamation
Reclamation: applies to VTA systems and consolidates stacked volumes which contain expired virtual volumes. Reclamation requires scratch volumes, and requires free drives, so it must be run at a time when recalls are minimal. Keep the reclamation percentage low, between 10 and 30%, otherwise reclamation will run for too long, and not achieve much
The consensus now is that you can put any kind of existing tape data onto virtual tape. In practice, some kinds of data are better than others. Also, virtual tape will provide a lot more drives, and that can benefit all applications. As always, the choice is up to you. Some things to consider are
Virtual tape requires a different type of performance monitoring, than real tape. There are three areas to watch
There are tools available to help with this. An example is Perfman for Tape Libraries, which will collect virtual tape performance data and report on it.