Problems with the HSM started task

These links lead to sections in the text below.

Finding the HSM started task and parameter library

The HSM startup JCL will be in one of your JES proclibs. This is site dependant. SYS1.PROCLIB(JES2) will contain a list of all allocated procedure libraries (assuming you use JES2).

The startup parameters are usually in member ARCCMD00 in SYS1.PARMLIB, but your site may well have changed this. The startup JCL will contain

procname PROC CMD=xx

where xx are the two characters after ARCCMD

back to top

How do I duplex HSM tapes?

HSM Tape duplexing is controlled by a parameter in the SYS1.PARMLIB member, usually ARCCMD00. The parameter is simply


In this example, we are duplexing the migration tapes but not the backup tapes.

back to top

Checking the activity log

A good place to look for started task problems, is to check the Activity Logs, but you can't browse them as HSM has exclusive use. Issue the HSM command


to view activity logs with data in them (the command has no effect for logs that are empty). If ACTLOGTYPE is specified as SYSOUT then they are printed to the spool.
If the activity logs are allocated to DASD, printing does not occur, but the current logs are closed and new copies of the logs are allocated to HSM.

To see useful data in the logs, you needed to have set the following parameter


This shows the results of all commands, successes and failures, and is the best option, though the logs could be big. Alternatives are EXCEPTIONONLY - failures only and REDUCED - Level 0 movement only

You can control where the log goes, with the parameter


The disk logs will be called sitehlq.Hhostid.function.Dyyddd.Thhmmss logs where Hhostid = HSM host ID from the PROC statement, and function is CMDLOG (cmd), BAKLOG (bkup), DMPLOG (dump), or MIGLOG (migration)

back to top

HSM started task abends

There are two ways of getting problem information, from dumps of the started task, or from PDA traces. If HSM abends, then is restarted, VSAM error messages are issued when HSM attempts to open the CDSs. A VERIFY command is issued and the open is retried, which is usually successful, then hopefully HSM continues processing normally.
However, if HSM cannot initialize successfully, this probably means that one of the Control Datasets and/or the journal has been damaged during the abnormal end. To fix this -

If both a CDS and the journal are corrupt. then you probably can't do a clean recovery. You might need to use the AUDIT or FIXCDS commands to make your records correct in the CDSs. AUDIT allows the system to cross-check the various records concerning data sets and HSM resources. AUDIT can list errors and propose diagnostic actions or, at your option, complete most repairs itself.

back to top

Auditing a CDS

Consider using the AUDIT command for the following reasons:

You can use AUDIT to cross-check the following sources of control information:

It is best to use AUDIT at times of low system activity, as some audit processes can run for quite some time. Be aware that an audit will lock out some HSM activity.

SMS-managed volumes cannot be audited with the AUDIT command and the process for auditing SMS-managed data sets is similar to that for auditing non-SMS-managed data sets.

If AUDIT is executing and backup of the HSM CDSs is started (via BACKVOL CDS command or AUTOBACKUP), all HSM functions on the host that started this backup are halted until the AUDIT function and the backup of the HSM CDS's have completed.

back to top

Diagnostic Patches

You can enable, or disable dumps for diagnostics with the following patches

Dump for Installation Exit Abends

PATCH .MCVT.+2D BITS(.......1)

Tracing OPEN/CLOSE/EOV problems

Tape - PATCH .MCVT.+F2 x'00' /* Activate */
PATCH .MCVT.+F2 x'FF' /* Deactivate */
DASD - PATCH .MCVT.+F3 x'00' /* Activate */
PATCH .MCVT.+F3 x'FF' /* Deactivate */

Causing a DUMP when DSS error occurs

PATCH .MCVT.+454'390' /* ADR390E */

Determining why SMS-managed datasets are not being processed

PATCH .MGCB.+26 x'FF' /* Space Mgt */
PATCH .BGCB. +24 x'FF' /* Backup */

back to top

HSM Hanging

When HSM appears to be hanging then the following should be checked before resorting to canceling HSM.

back to top

Entering commands with MODIFY

If for any reason you cannot enter HSM commands from TSO, you can use the MVS Modify command from the system console. All modify commands start F "your started task name". The commands below assume the task is called HSM

F HSM,CANCEL REQUEST(requestnumber)

back to top

Using VSAM RLS for the Control Datasets

VSAM Record Level Sharing is explained in a bit of detail in the VSAM RLS section. If you run multiple DFHSM tasks on multiple LPARS then RLS can help DFHSM performance as the individual tasks enque the CDS files at record level instead of cluster level, so allowing concurrent reads and updates. VSAM RLS uses the sysplex Coupling Facility and to use it, the CDS files must be SMS managed.

The use RLS you need to work through the following tasks

The control datasets must be SMS managed
The control datasets cannot be split by keyranges. This technique was commonly used to cope with very large files, especially the MCDS, but is not supported for z/OS release 1.3 and above anyway. Alternative options are to use extended addressing or multi-cluster CDS files.
The CDS datasets must also be flagged as non-recoverable by using the IDCAMS ALTER command with the LOG(NONE) parameter.

If you are already running RLS, then you need to check that the cache structure is at least 1MB. Otherwise you need to set up RLS as explained in the link above.

Finally, you need to add CDSSHR=RLS to all the DFHSM catalogued procedures then recycle them

back to top

Using Large Format Disks for the Journal Files

Each HSM Journal file must fit onto one disk and be in a single extent. This means that the files are limited in size by the available disk sizes, which meant no bigger than 8.5 GB if you were using Model 9 disks. This is not big enough for HSMplex systems where several HSM instances could be sharing the same journal files

Consider using large formal Extended Attribute disks, sometimes called 3390-A disks. These can contain up to 223 Gigabytes. The actual size of these volumes will vary depending on how they were created on the underlying hardware, but in general they are much bigger than standard volumes, and will resolve the issue of journal files being too small.

back to top

Mainframe HSM

Lascon updTES

I retired 2 years ago, and so I'm out of touch with the latest in the data storage world. The Lascon site has not been updated since July 2021, and probably will not get updated very much again. The site hosting is paid up until early 2023 when it will almost certainly disappear.
Lascon Storage was conceived in 2000, and technology has changed massively over those 22 years. It's been fun, but I guess it's time to call it a day. Thanks to all my readers in that time. I hope you managed to find something useful in there.
All the best