Navigation Bar

Problems with the HSM started task


Finding HSM and its parameter library
How do I duplex HSM tapes?
Checking the activity logs
Started Task won't start
Auditing the CDS files
Patches to get specific dumps
HSM hanging
Using the MODIFY command
Using VSAM RLS for the Control Datasets

Finding the HSM started task and parameter library

The HSM startup JCL will be in one of your JES proclibs. This is site dependant. SYS1.PROCLIB(JES2) will contain a list of all allocated procedure libraries (assuming you use JES2).

The startup parameters are usually in member ARCCMD00 in SYS1.PARMLIB, but your site may well have changed this. The startup JCL will contain

 procname  PROC CMD=xx

where xx are the two characters after ARCCMD
back to top

How do I duplex HSM tapes?

HSM Tape duplexing is controlled by a parameter in the SYS1.PARMLIB member, usually ARCCMD00. The parameter is simply

 SETSYS DUPLEX(BACKUP(N) MIGRATION(Y))

In this example, we are duplexing the migration tapes but not the backup tapes.
back to top

Checking the activity log

A good place to look for started task problems, is to check the Activity Logs, but you can't browse them as HSM has exclusive use. Issue the HSM command

RELEASE HARDCOPY

to view activity logs with data in them (the command has no effect for logs that are empty). If ACTLOGTYPE is specified as SYSOUT then they are printed to the spool.
If the activity logs are allocated to DASD, printing does not occur, but the current logs are closed and new copies of the logs are allocated to HSM.

To see useful data in the logs, you needed to have set the following parameter

SETSYS ACTLOGMSGLVL FULL

This shows the results of all commands, successes and failures, and is the best option, though the logs could be big. Alternatives are EXCEPTIONONLY - failures only and REDUCED - Level 0 movement only

You can control where the log goes, with the parameter

SETSYS ACTLOGTYPE SYSOUT(class)
or
SETSYS ACTLOGTYPE DASD UNIT=diskesoteric SPACE=(TRKS,(n,m))

The disk logs will be called sitehlq.Hhostid.function.Dyyddd.Thhmmss logs where Hhostid = HSM host ID from the PROC statement, and function is CMDLOG (cmd), BAKLOG (bkup), DMPLOG (dump), or MIGLOG (migration)
back to top

HSM started task abends

There are two ways of getting problem information, from dumps of the started task, or from PDA traces. If HSM abends, then is restarted, VSAM error msgs are issued when HSM attempts to open the CDSs. A VERIFY command is issued and the open is retried, which is usually successful, then hopefully HSM continues processing normally.
However, if HSM cannot initialize successfully, this probably means that one of the Control Datasets and/or the journal has been damaged during the abnormal end. To fix this -

  • Analyze the dump resulting from the abnormal end to determine what is damaged

    If it is a CDS:

    • Recover it via an IDCAMS IMPORT command to the most recent backup copy.
    • Start HSM
    • Issue the UPDATEC command to combine the latest transactions in the journal data set with the restored backup copy of the control data set.

    If it is the journal:

    • Stop HSM
    • Reallocate a new journal data set
    • Restart HSM
    • Back up the CDSs.

If both a CDS and the journal are corrupt. then you probably can't do a clean recovery. You might need to use the AUDIT or FIXCDS commands to make your records correct in the CDSs. AUDIT allows the system to cross-check the various records concerning data sets and HSM resources. AUDIT can list errors and propose diagnostic actions or, at your option, complete most repairs itself.
back to top

Auditing a CDS

Consider using the AUDIT command for the following reasons:

  • After any CDS restore (highly recommended)
  • After an ARC184I message (error when reading/writing DFSMShsm CDS records)
  • Errors on the RECALL or DELETE of migrated data sets
  • Errors on BDELETE or RECOVER of backup data sets
  • DFSMShsm tape-selection problems
  • RACF mismatch messages
  • Power or hardware failure

You can use AUDIT to cross-check the following sources of control information:

  • MCDS or individual migration data set records
  • BCDS or individual backup data set records or ABARS records
  • OCDS or individual DFSMShsm-owned tapes
  • DFSMShsm-owned DASD volumes
  • Migration-volume records
  • Backup-volume records
  • Recoverable-volume records (from dump or incremental backup)
  • Contents of SDSP data sets

It is best to use AUDIT at times of low system activity, as some audit processes can run for quite some time. Be aware that an audit will lock out some HSM activity.

SMS-managed volumes cannot be audited with the AUDIT command and the process for auditing SMS-managed data sets is similar to that for auditing non-SMS-managed data sets.

If AUDIT is executing and backup of the HSM CDSs is started (via BACKVOL CDS command or AUTOBACKUP), all HSM functions on the host that started this backup are halted until the AUDIT function and the backup of the HSM CDS's have completed.
back to top

Diagnostic Patches

You can enable, or disable dumps for diagnostics with the following patches

Dump for Installation Exit Abends

     PATCH .MCVT.+2D BITS(.......1)

Tracing OPEN/CLOSE/EOV problems

Tape - PATCH .MCVT.+F2 x'00'  /* Activate */
       PATCH .MCVT.+F2 x'FF'  /* Deactivate */ 
DASD - PATCH .MCVT.+F3 x'00' /* Activate */
       PATCH .MCVT.+F3 x'FF'  /* Deactivate */

Causing DUMP when DSS error occurs

      PATCH .MCVT.+454'390'  /* ADR390E  */

Determining why SMS-managed datasets are not being processed

      PATCH .MGCB.+26 x'FF'  /* Space Mgt  */ 
      PATCH .BGCB. +24 x'FF'  /* Backup */

back to top

HSM Hanging

When HSM appears to be hanging then the following should be checked before resorting to canceling HSM.

  • Across all LPARs check whether there are any outstanding message that require a response. If all your LPARS are in a Sysplex, then the SDSF command
      D R,R
    

    will display all outstanding prompts for all LPARs.
  • Check across all LPARs whether HSM is waiting on tape drives and if so prioritise making them available.
  • Check to see if there are any reserves or enqueues which are causing HSM to queue. How you do this will depend on what monitoring software you have installed onsite.
  • Check across all LPARs for active HSM tasks by issuing;
                                                                        
        F xxxxxx, Q ACT                                                 
    

    So as to check whether HSM activity such as AUDIT or TAPECOPY on one LPAR is impacting on activity on another LPAR.

  • Issue F xxxxxx,HOLD ALL across all LPARs (or HOLD AUDIT/TAPECOPY etc) and wait a few minutes to see whether HSM activity returns to normal (potentially this could 10 minutes or longer.).

back to top

Entering commands with MODIFY

If for any reason you cannot enter HSM commands from TSO, you can use the MVS Modify command from the system console. All modify commands start F "your started task name". The commands below assume the task is called HSM

F HSM,SWAP LOG

F HSM,CANCEL USERID(userid)

F HSM,SETSYS etc

F HSM,CANCEL DATASETNAME(datasetname)

F HSM,CANCEL REQUEST(requestnumber)

back to top


Using VSAM RLS for the Control Datasets

VSAM Record Level Sharing is explained in a bit of detail in the VSAM RLS section. If you run multiple DFHSM tasks on multiple LPARS then RLS can help DFHSM performance as the individual tasks enque the CDS files at record level instead of cluster level, so allowing concurrent reads and updates.

The use RLS you need to work through the following tasks

The control datasets must be SMS managed
The control datasets cannot be split by keyranges. This technique was commonly used to cope with very large files, especially the MCDS, but is not supported for z/OS release 1.3 and above anyway. Alternative options are to use extended addressing or multi-cluster CDS files.
The CDS datasets must also be flagged as non-recoverable by using the IDCAMS ALTER command with the LOG(NONE) parameter.

If you are already running RLS, then you need to check that the cache structure is at least 1MB. Otherwise you need to set up RLS as explained in the link above.

Finally, you need to add CDSSHR=RLS to all the DFHSM catalogued procedures then recycle them

back to top


Copyright © Lascon Storage Ltd. 2000 to present date. By entering and using this site, you accept the conditions and limitations of use