TDMF (Transparent Data Migration Facility) is used to move z/OS mainframe disk volumes without affecting active applications. It is completely independent of any hardware microcode, so it can be used to move disks between different vendor's storage equipment. Provided line speeds are adequate to deal with active I/O rates, it can also be used to move volumes between data centres. TDMF moves volumes; its sister product LDMF moves datasets.
TDMF can run as a started task but I've only ever seen it used in Batch. In brief the move process involves;
Check that the all the move conditions are correct to ensure data integrity
Copy all tracks from the source to the target disk, while recording any tracks that have changed at the source
Copy all changed tracks to the target while still recording changed tracks at the source. Repeat this until the number of changed tracks is less than a threshold number
Temporarily quiesce the I/O from all applications on all LPARS, copy remaining changed tracks
Switch volumes so the source goes offline and the target online
Resume the applications
TDMF can move most z/OS volumes that are in ECKD format, the only exceptions are volumes containing local page datasets and coupling facility datasets. Native Linux volumes and VM volumes cannot be moved with TDMF as they are not in CKD format.
Before you start to use TDMF you need to run a job called SYSOPTN which sets up the security keys, defines some TDMF parameters and sets some default options. If your site has used TDMF previously then you may need to run it in update mode to refresh the licence. If you want to know what those default options are, then go into TDMF and take option 9 from the main panel. You will then see a screen that shows you all the installation options. If you press enter on this screen, you will also see the current TDMF version number.
A TDMF session can be defined as the set of batch jobs, Master and Agents, required to perform a TDMF migration and an associated COMMDS file. The following actions are needed to run a TDMF session
Allocate the COMMDS. The COMMDS is a dataset used to communicate between LPARS.
Set up the Master Job. The Master system runs on one LPAR and controls the migration sessions.
Set up the Agent Jobs. The Agents run on the other LPARS and record the I/O activity on those LPAPS
Run the Master Job
Run the Agent Jobs. These jobs must start no more than 15 minutes after the Master.
Monitor
Setting up a TDMF session
The COMMDS
The COMMDS (sometimes called the SYSCOM file as it is pointed to by a SYSCOM DD statement in batch jobs) is used to pass information between the Master task and the Agent tasks on the other LPARS, and contains the status and messages related to a specific session. You also use the COMMDS in the TDMF TSO monitor to get information about current and past sessions. TDMF uses hardware RESERVES to serialise disk I/O and so the COMMDS must be placed on a quiet volume, preferably on a dedicated volume. Use a dataset name that is excluded from SMS just to ensure the COMMDS is placed on the volume that you want, and make sure it is not on a volume that you are moving with TDMF.
It is best to use a separate filename for each job so you can check previous TDMF runs so a good standard is to incorporate the job name into your file name, as this makes it easy to tie a COMMDS back to a specific job. A good standard is TDMF.jobname.COMMDS.
The communication dataset must be allocated on a cylinder boundary with contiguous space, and it must be on a CKD/E disk. If you are relocating to a new disk subsystem or Data Center then it is best to allocate a TDMF specific, non-SMS volume on the target subsystem.
The syscom dataset itself is formatted and cannot really be browsed directly; you need to view it with TDMF.
The size (number of cylinders required) is based upon the following formula:
CYLS = V * (S + K)
Where:
V = is related to the number of volumes:
· 64 volumes V = 2.5
· 128 volumes V = 5.0
· 256 volumes V = 7.5
· 512 volumes V = 10.0
S = is the number of participating systems or LPARS
K = is related to the size of the source volumes involved
· 3390-3 K = 4
· 3390-9 K = 6
· 3390-27 K = 15
For example: if you are moving 128 3390-3 and 128 3390-9 volumes across 8 LPARs. Setting 'K' for the largest device type in session,
CYLS = 7.5 * (8 + 6)
CYLS = 105
If you are allocating a lot of COMMDS files up from for a migration, the best way to do it is to use an in-stream JCL procedure like this.
TDMF has one quirk that you should be aware of when using a COMMSDS: it bypasses the VTOC and accesses the data by absolute track. Suppose that you are running a lot of TDMF jobs, and you have one disk that you are using to store all your COMMDS files. When you complete a section of the migrations, you may want to wipe that disk and start again, so you delete all the old COMMSDS files and allocate a new set. Now, if you go into TDMF and pick up a new COMMDS from a job that has not run yet, say using option 7 - PAST SESSION DISPLAY DETAILS, instead of telling you there is no data to display, it will retrieve the data that still exists on disk from the previous old COMMDS and return it. This can cause a bit of confusion! This is not a bug, it is a feature of TDMF. The only foolproof answer is to always use a new, clean volume if you can.
The Master task
There is only one master task per session, but a Master task can handle multiple groups of volumes. All active LPARS must be defined to the Master system to prevent data corruption issues. You should place your Master task on the LPAR with the most update activity to the volumes that you are moving. It is possible to run several Master tasks in parallel, but they should all have their own COMMDS and you must run either GRS or MIM to ensure data integrity.
The two MIGRATE statements above identify two volumes, XPA02A/B that are moving to the addresses used by volumes SPA92A/B. You would normally want more than two volumes in a job. How many volumes should you have? That is really up to you, but bear in mind that if you make your jobs too small, then running and checking them will be very laborious, while if you make them too big, you have to wait a long time for them to finish, especially if you are having problems. It is not a good idea to cancel a migration.
As a rule of thumb, it takes about 2-3 minutes to move a 3390-3 and 5-8 minutes to move a 3390-9. I'd suggest a 30 minute job runtime is reasonable, so if you are running 10 migrates in parallel, then 120 mod3s or 60 mod 9s could be appropriate. The exception is system volumes like SYSRES, PAGE, SPOOL etc, which I'd always run individually.
The OPTIONS can be used to override the global options set by the SYSOPTN job. These are SAMPLE options and may not be valid for your site. I'd suggest that you check the TDMF manual for a full explanation of these options, select the ones that work for you then TEST THEM on appropriate data and systems.
UNIDENTIFIED SYSTEMS is used to determine what to do about systems or LPARS that have a route to a volume, but no Agent is active on those systems. Options are IGNORE, WARN, ERROR and TERMINATE. This is not a foolproof way of identifying all systems as it depends on 3990-6 controller facilities and not all vendors support these.
CHECKTARGET means check that the volume is empty before proceeding
NOPROMPT means TSDF will not send out a confirmation message before synchronising source with target.
The RELABEL(TD) option means that when the migrates are complete, the target volumes will be relabelled as TDA92A/B. Alternatively, you can code this explicitly for each volume in the migrate statements as follows
FASTCOPY means that TDMF will just copy used bytes to the target disk. This is appropriate for new disks, but may not be a good idea if you are copying over existing data.
PACING means that TSMF will initially move 15 tracks at a time, but will reduce this if it finds that the disk is busy.
NOPURGE means do not delete off any existing data on the target that was not overwritten by the source data.
The Agent Tasks
There must be an agent on every LPAR except the Master LPAR. The Agents
Communicate with the Master for migration requests
Monitor Source I/O activity on their LPAR
Monitor Target I/O activity on their LPAR
Notifies the Master about any Source I/O updates
It is possible to run more than one Agent job in an LPAR, as long as each Agent is associated with a different Master task and communicates with that task with a separate and unique COMMSDS. All the Agent tasks on every LPAR must be started within 15 minutes of the Master task starting or the session will time out. However an Agent can be started before the Master.
If all your participating LPARS are in a single SYSPLEX then you can easily set the Master and all the Agents in one PDS member, using
/*JOBPARM S=system
route commands to make sure the correct job runs on the correct system, then you just type 'SUB' once to run all the jobs.
The 7 phases of TDMF
The Master system initiates and controls all migrations/replications. The Master initiates each phase and all Agents must acknowledge this to proceed.
SYSTEM INITIALISATION phase
System initialisation involves the Master task and all the Agent tasks starting up within 15 minutes, and reporting error-free validation for all volumes within a session. Checking includes making sure no other LPARS are accessing those volumes, and if the TDMF session has been set up to use SAF, then the volumes have the correct SAF authorisation.
INITIALISATION phase
This phase confirms that the source and target volumes are valid and if requested, waits for the Operator to reply to the Confirm WTOR. Once this is confirmed, the volume-level control blocks and real storage frames are allocated.
ACTIVATION phase
This phase starts the copy task and enables the monitoring of user I/O activity. While the data is copied from source to target, if updates to a source volume are detected in any participating LPAR, the Master system gathers that information for the REFRESH task.
Once all the tracks are copied by the COPY volume task, the Master then starts the copy REFRESH task. Further updates may happen, so the Master will run multiple refresh tasks until TDMF determines that synchronization of the target volume may be achieved, at which time, the Master system will move on to the Quiesce phase.
QUIESCE phase
The Master system instructs all Agents to stop all I/O activity to the source volume and pass it a final list of all updated tracks. The Master then performs a copy synchronous task to make the target disk a replica of the source.
Volume I/O redirect phase
All I/O is now permanently redirected to the target volume, which is effectively now the source volume. Once the redirect request is successful, the Master rewrites volume labels on both source and target.
Resume phase
The Master initiates a resume request via the Agents, to resume all I/O activity, now directed to the Target volume, and the original Source is varied offline.
Terminate phase
When a volume completes a migration, that volume's fixed storage will be freed for possible re-use
within the current session
Hints and tips
The key to a successful TDMF migration is careful planning up front, which applies to most projects of course. There are a few datasets and volumes that need special treatment and some of them are discussed here, but consult your TDMF manual to get a full picture.
Unidentified LAPRS
Many large sites have more than one SYSPLEX, or they have LPARS are not part of the SYSPLEX. It is probably that some volumes will be shared and online between SYSPLEXES or rogue LPARS. Typically these will be IODF volumes, tape management system volumes or volumes used by Sysprogs for various nefarious purposes.
This means that you have a good chance of a TDMF job failing with an unidentified system message. If you are migrating an entire string of disks, and that string has one volume online to an unidentified system, then the entire string will fail with an error message like
TDM2381I This source volume connected to 1 unidentified system(s).
TDM2382I 8000029880 2094 02/28/2009 01:18:46.
The answer is to find the rogue LPAR, look at the string and check out which volumes are online. You will have already identified these in your detailed planning of course. You can then safely change the UNIDENTIFIEDSYSTEMS(TERMINATE) parameter in the MASTER job to UNIDENTIFIEDSYSTEMS(IGNORE) and rerun the job for those volumes that are NOT online to the rogue LPAR. For the volumes that are online, you need to allocate a COMMDS that can be accessed from every LPAR, then rerun the jobs for those disks with an agent on every LPAR including the rogue one. Alternatively, you can move the data without using TDMF.
The LPAR can be identified from the TDMF2382I message, the last four characters "9880" in the message are the CPU number, and the 5th last "2" is the LPAR number.
Special Datasets and Volumes
Be aware of where your TDMF load library resides and take care not to try to move it with TDMF. When you need to move that volume, move the load library to another disk and temporarily APF authorise it.
Watch out for various control datasets, like DFHSM BCDS / MCDS / OCDS, HSC, RACF and MIM control files. TDMF can move them, but it is recommended that you move them one at a time. For absolute safety, it is recommended that DFHSM and DFRMM be stopped when moving their control datasets.
Several CA products have control files that need special handling. For example, if you migrate the CA7 Commsds then you should shut down both CA7 and ICOM. See the manual for full details.
If your work volumes are SMS managed and you have sufficient capacity, it is also a good idea to QUINEW the volumes in SMS before the move to prevent new allocations and limit the amount of active IO to the disks
V SMS,VOL(xxxxxx,ALL),Q,N
When you are finished, enable them again with
V SMS,VOL(xxxxxx,ALL),E
Note: These SMS commands act over all LPARS and while that is the correct action to disable or quiesce volumes, that might not be the correct enabled configuration for your site.
When you move SYSRES volumes, the unit address will change, and it's the unit address that is used to reference a SYSRES volume at IPL time, not the VOLSER. So when you move a SYSRES volume, you need to let the Operators and System programmers, and anyone else who might be interested know about the new unit address. It is good practice to move the SYSRES and alternate SYSRES volumes in separate sessions.
JES Spool volumes are usually busy so when you move them, do them in a quiet period, one at a time and set them to Drain to prevent new access. Remember to change the JES CHKPT addresses in SYS1.PARMLIB(COMMND00)
You cannot move local page datasets with TDMF. Use the command D ASM to identify which page files are local page. It's best to move these files with sysprog assistance to ensure the correct PAGEADD and PAGEDELETE commands are used and the correct system description files are updated.
The best way to handle SYSPLEX coupling datasets is to just switch to the alternates, then move the originals when they are not in use. If you use GDPS then you must switch the datasets with a GDPS script or you may cause a system outage.
How to make sure no-one else uses your Targets
The best way to set your targets up is to initialise them as SMS in your ISKDSF job by using the SG parameter, but do not add them to any SMS storage pool. This means that a non-SMS user cannot allocate data on them as they are SMS defined, but they cannot be used by SMS as they are not in a pool. When TDMF migrates a source onto a target it copies the VTOC and the VOLSER from the source, so the target then becomes usable. If you really want your target volumes in an SMS pool, then they must be set to DISNEW. Use the command
V SMS,VOL(xxxxxx,ALL),D,N
Hardware considerations
If you are a GDPS user then set Hyperswap to OFF while doing any TDMF moves.
The volumes that you are moving cannot be in an active Flashcopy relationship.
If a TDMF job is cancelled for any reason then the source or target volumes could have an invalid DPTSIO pointer. If your COMMDS is intact, you can fix this by running the original TMDF jobs again, with PARM=RECOVERMASTER or RECOVERAGENT as appropriate.
Softek recommends that you switch DASD fast write cache off while moving work volumes. Your DASD may not allow this, but if it does, the commands are
SETCACHE VOLUME(xxxxxx) CACHEFASTWRITE UNIT(3390) ON
Note that while the command just references one volume, cache is actually turned off for the whole subsystem.
If you move a smaller disk to a larger disk then you need to rebuild the VTOCIX so it recognises the extra free space. The best way to do this is to use the TDMF parameter EXTVTOC and let TDMF do it as part of the move. Otherwise, you need to vary the volume offline to all other LPARS, then run the following job
TDMF will clip the source volume to a different volser at the end of the move. All this does is changes the volume label (Change Label In Place) but the original data is still there, and the original VTOC too. If you are not decommissioning the subsystem, then this may cause confusion later as it looks like the volume is full of data, so it is good practice to re-initialise the disks once the moves are complete. If you are decommissioning the volumes, then you really should be running something like FDRERASE to wipe the data anyway.
TDMF testing
If you are trying to put together a complex migration plan and you are struggling to get your head around all the variables, then you can validate the correctness of your TDMF job by specifying
EXEC PGM=TDMFMAIN,PARM=(MASTER,SCAN) in your master JCL. This is an excellent way of finding out any potential problems with a TDMF move before trying to execute it for real.
The output from the job looks something like
TDM1177I The source volume RGS002 is mounted on device 9198 on this system.
TDM1186I The target volume RRDF42 is mounted on device 8941 on this system.
other volumes lines, including any errors
TDM2405I This volume successfully selected for initialization.
TDM2281I The Master system is starting the initialization process for a volume.
TDM2283I The Master system is starting the migration process for a volume.
TDM2722I Volume termination requested by "SCAN ONLY".
TDM2293I The Master system is starting the termination process for a volume.
TDM2303I The Master system has completed the migration process for a volume.
TDM2410I All storage frames to migrate this volume have been successfully page freed
The source and target volumes are not affected in any way by the test. The only problem with this is that if you check the status of the run though option '6' on the TDMF panels, all the disks are in 'terminated' mode, as they were terminated by the scan. However if the jobs ends cc=0 then that is a good indication of success.
CSA issues
TDMF retains some ECSA storage in SP228 key0 even after jobs complete, so a long run of TDMF migrations can cause system CSA shortages. The Softek recommendation is to IPL at the end of migrations to clear out any retained storage. This is an issue with TDMF v4 and earlier as it allocates storage using the Resource Manager. The issue is fixed in version 5, as now TDMF allocates its own storage using directed LOAD/DELETE/FREEMAIN calls.
If you cannot schedule an IPL, then try stopping and starting the initiators that were used for the TDMF jobs as that releases the CSA.