Finding the HSM started task and parameter library
The HSM startup JCL will be in one of your JES proclibs. This is site dependant. SYS1.PROCLIB(JES2) will contain a list of all allocated procedure libraries (assuming you use JES2).
The startup parameters are usually in member ARCCMD00 in SYS1.PARMLIB, but your site may well have changed this. The startup JCL will contain
procname PROC CMD=xx
where xx are the two characters after ARCCMD
back to top
How do I duplex HSM tapes?
HSM Tape duplexing is controlled by a parameter in the SYS1.PARMLIB member, usually ARCCMD00. The parameter is simply
SETSYS DUPLEX(BACKUP(N) MIGRATION(Y))
In this example, we are duplexing the migration tapes but not the backup tapes.
back to top
Checking the activity log
A good place to look for started task problems, is to check the Activity Logs, but you can't browse them as HSM has exclusive use. Issue the HSM command
RELEASE HARDCOPY
to view activity logs with data in them (the command has no effect for logs that are empty). If ACTLOGTYPE is specified as SYSOUT then they are printed to the spool.
If the activity logs are allocated to DASD, printing does not occur, but the current logs are closed and new copies of the logs are allocated to HSM.
To see useful data in the logs, you needed to have set the following parameter
SETSYS ACTLOGMSGLVL FULL
This shows the results of all commands, successes and failures, and is the best option, though the logs could be big. Alternatives are EXCEPTIONONLY - failures only and REDUCED - Level 0 movement only
You can control where the log goes, with the parameter
SETSYS ACTLOGTYPE SYSOUT(class)
or
SETSYS ACTLOGTYPE DASD UNIT=diskesoteric SPACE=(TRKS,(n,m))
The disk logs will be called sitehlq.Hhostid.function.Dyyddd.Thhmmss logs
where Hhostid = HSM host ID from the PROC statement, and function is CMDLOG (cmd), BAKLOG (bkup), DMPLOG (dump), or MIGLOG (migration)
back to top
HSM started task abends
There are two ways of getting problem information, from dumps of the started task, or from PDA traces. If HSM abends, then is restarted, VSAM error msgs are issued when HSM attempts to open the CDSs. A VERIFY command is issued and the open is retried, which is usually successful, then hopefully HSM continues processing normally.
However, if HSM cannot initialize successfully, this probably means that one of the Control Datasets and/or the journal has been damaged during the abnormal end. To fix this -
Analyze the dump resulting from the abnormal end to determine what
is damaged
If it is a CDS:
Recover it via an IDCAMS IMPORT command to the most recent backup
copy.
Start HSM
Issue the UPDATEC command to combine the latest transactions in the journal data set with the restored backup copy of the control data set.
If it is the journal:
Stop HSM
Reallocate a new journal data set
Restart HSM
Back up the CDSs.
If both a CDS and the journal are corrupt. then you probably can't do a clean recovery. You might need to use the AUDIT or FIXCDS commands to make your records correct in the CDSs. AUDIT allows the system to cross-check the various records concerning data sets and HSM resources. AUDIT can list errors and propose diagnostic actions or, at your option, complete most repairs itself.
back to top
Auditing a CDS
Consider using the AUDIT command for the following reasons:
After any CDS restore (highly recommended)
After an ARC184I message (error when reading/writing DFSMShsm CDS
records)
Errors on the RECALL or DELETE of migrated data sets
Errors on BDELETE or RECOVER of backup data sets
DFSMShsm tape-selection problems
RACF mismatch messages
Power or hardware failure
You can use AUDIT to cross-check the following sources of control
information:
MCDS or individual migration data set records
BCDS or individual backup data set records or ABARS records
OCDS or individual DFSMShsm-owned tapes
DFSMShsm-owned DASD volumes
Migration-volume records
Backup-volume records
Recoverable-volume records (from dump or incremental backup)
Contents of SDSP data sets
It is best to use AUDIT at times of low system activity, as some audit
processes can run for quite some time. Be aware that an audit will lock out some HSM activity.
SMS-managed volumes cannot be audited with the AUDIT command and the
process for auditing SMS-managed data sets is similar to that for auditing
non-SMS-managed data sets.
If AUDIT is executing and backup of the HSM CDSs is started (via BACKVOL CDS command or AUTOBACKUP), all HSM functions on the host that started this backup are halted until the AUDIT function and the backup of the HSM CDS's have completed.
back to top
Diagnostic Patches
You can enable, or disable dumps for diagnostics with the following patches
When HSM appears to be hanging then the following should be checked before
resorting to canceling HSM.
Across all LPARs check whether there are any outstanding message
that require a response. If all your LPARS are in a Sysplex, then the SDSF command
D R,R
will display all outstanding prompts for all LPARs.
Check across all LPARs whether HSM is waiting on tape drives
and if so prioritise making them available.
Check to see if there are any reserves or enqueues which are causing HSM to queue. How you do this will depend on what monitoring software you have installed onsite.
Check across all LPARs for active HSM tasks by issuing;
F xxxxxx, Q ACT
So as to check whether HSM activity such as AUDIT or TAPECOPY on one LPAR is impacting on activity on another LPAR.
Issue F xxxxxx,HOLD ALL across all LPARs (or HOLD AUDIT/TAPECOPY etc) and wait a few minutes to see whether HSM activity returns to normal (potentially this could 10 minutes or longer.).
If for any reason you cannot enter HSM commands from TSO, you can use the MVS Modify command from the system console. All modify commands start F "your started task name". The commands below assume the task is called HSM
F HSM,SWAP LOG
F HSM,CANCEL USERID(userid)
F HSM,SETSYS etc
F HSM,CANCEL DATASETNAME(datasetname)
F HSM,CANCEL REQUEST(requestnumber)
VSAM Record Level Sharing is explained in a bit of detail in the VSAM RLS section. If you run multiple DFHSM tasks on multiple LPARS then RLS can help DFHSM performance as the individual tasks enque the CDS files at record level instead of cluster level, so allowing concurrent reads and updates.
The use RLS you need to work through the following tasks
The control datasets must be SMS managed
The control datasets cannot be split by keyranges. This technique was commonly used to cope with very large files, especially the MCDS, but is not supported for z/OS release 1.3 and above anyway. Alternative options are to use extended addressing or multi-cluster CDS files.
The CDS datasets must also be flagged as non-recoverable by using the IDCAMS ALTER command with the LOG(NONE) parameter.
If you are already running RLS, then you need to check that the cache structure is at least 1MB. Otherwise you need to set up RLS as explained in the link above.
Finally, you need to add CDSSHR=RLS to all the DFHSM catalogued procedures then recycle them