z/OS data migration with FDRPAS
The problem with moving disks
One of the problems that the storage person has to deal with from time to time is moving disk volumes. In the past two or three years I've had to do this when freeing up an ESS array to reformat it, when installing new disk subsystems and moving the data off the old ones, and when relocating to a new machine hall. Another possible reason is to relocate a volume that is performing badly onto a less busy disk string.
What makes this a problem is that it is almost impossible to get system downtime these days to expedite the move. I used to be able to book out exclusive Sunday dayshift for disk maintenance. Now, even getting a couple of hours on a Saturday night shift is difficult.
Fortunately, there are products out there that will move disks without needing system downtime and with minimal impact on user applications. One of these is FDRPAS.
FDRPAS supports the big four mainframe storage hardware vendors; IBM, EMC, SUN and Hitachi, and will move disks between different vendor's hardware. FDRPAS will handle every type of volume, including SYSRES volumes, the only exception being volumes containing local page or swap datasets.
There are two general types of FDRPAS processing, single system image and multiple images in a sysplex. Before we take a look at these, here are a few general comments about FDRPAS.
- Because FDRPAS is working alongside active applications on potentially disparate hardware, it is essential that you have the correct levels of microcode and software PTFs applied. Innovation, the FDRPAS suppliers, will advise you here.
- All the currently active volumes ('Source volumes') must be online, and all the volumes that you are going to move to ('Target volumes') must be offline to all LPARs. At the end of the job, FDRPAS will vary the Source volume off-line then vary the Target volume on-line.
- One of the issues with doing a disk-to-disk copy with the VOLSER preserved is that you end up with two identical disks on the system, though only one of them can be online. This causes problems at IPL time as z/OS does not know which one to mount and will ask the operator to decide. If the original volume is mounted by mistake then the data will be back-leveled and this can be catastrophic. FDRPAS automatically modifies the label of the Source volume when it varies it offline, so this situation cannot happen
- FDRPAS can be initiated with batch jobs, from a started task or with ISPF panels. The examples below assume that you will use batch jobs.
- Before you start to run swap it would be a good idea to check that the various disk components are not faulty. This means checking the VTOC, the VTOCIX and the VVDS. The catalog section shows you how to check the VVDS. FDRCPK is a good tool for checking the integrity of the VTOC and VTOC Index
- After the swap has completed, it would be a good idea to wipe the data from the source volume with FDRERASE
Running FDRPAS on a single LPAR
All the volumes to be moved must be just accessed by a single z/OS image.
You start the process by running a batch job like this.
//SWAP EXEC PGM=FDRPAS,REGION=0M //STEPLIB DD DISP=SHR,DSN=your.fdrpas.loadlib //SYSPRINT DD SYSOUT=* //FDRSUMM DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //SYSIN DD * SWAP TYPE=FULL,MAXTASKS=3,CHECKTARGET=YES MOUNT VOL=D30001,SWAPUNIT=FC01 MOUNT VOL=D30002,SWAPUNIT=FA20 MOUNT VOL=D30003,SWAPUNIT=F134
This job will concurrently move the three volumes identified by the VOL= parameters to the addresses identified in the SWAPUNIT= parameters. CHECKTARGET=YES is an optional safety feature used to make sure that the target volumes are empty. Each job can handle up to 32 pairs of volumes in parallel and you can run several jobs concurrently. In this example, MAXTASKS=3 means move three volumes in parallel.
The move process goes through 5 phases
FDRPAS checks that the conditions are correct for the move; both devices must be the same type (3380 or 3390), target devices must be offline, target device must be at least as big as source, source device does not contain an active page dataset.
Phase2 Install IO intercept
You are moving the volumes while they are active and the move will take some time, so FDRPAS needs to be able to detect IO activity on the source volumes. To do this, it suspends IO to the source volume for a short while, while it installs an IO intercept. This suspend will take a short while and will not affect active applications.
FDRPAS will now copy data to the target volume. All used tracks (for inactive data sets) and all allocated tracks (for active data sets) are copied, while FDRPAS simultaneously detects updates to the Source.
FDRPAS will now re-copy any tracks that were updated after the copy process starts. If more than 150 tracks have changed, it will copy changed tracks with IO active. It will then repeat this process until there are less than 150 tracks to copy, at which point it will suspend the IO again while it copies the last few tracks.
Phase5 Swap completion
At the end of the consolidation process, the Source and Target volumes are identical. All I/O activity to the Source is now quiesced for a second or so, the source volume is taken offline and the target varied online, then the active application are swapped across to the Target volume.
This process is illustrated in the movie below.
Running FDRPAS on multiple LPARS
Most z/OS sites run with several LPARS accessing disks concurrently, usually in a parallel sysplex. In this case, you need to ensure the starting conditions are correct for all LPARs and that FDRPAS intercepts active IO from every LPAR that is accessing the Source disk. To do this, FDRPAS has two types of task, Swap tasks and Monitor tasks.
You will only run one Swap task per move, and ideally you should run it on your most active system. A Swap task can process 32 volumes in parallel, but hundreds of volumes can be included in one job. The MAXTASKS parameter governs how many volumes will be moved at once.
Monitor Tasks run on all the other LPARs, and their job is to check that the source device is online and the target offline to their LPAR and intercept all active IO on that LPAR. The monitor services will also perform the physical disk swap on its LPAR when the copy is complete.
The multi-LPAR move process goes through 8 phases
Phase1 Start monitor tasks
Monitor tasks must be started on every LPAR that has access to the Target disks, including the one that will run the swap task. The sample JCL below could be used to run a monitor in batch. Note that the JCL just identifies the Target units.
//MONITOR EXEC PASPROC //SYSIN DD * MONITOR TYPE=SWAP MOUNT SWAPUNIT=1100 MOUNT SWAPUNIT=1101 MOUNT SWAPUNIT=1102
The monitor will watch for a SWAP task starting on a different LPAR that is swapping to the same devices. The monitor task will run until a swap to the target has completed.
Phase2 Start a Swap task
The swap task should be started on the busiest LPAR. This task details both the Source and Target devices. Sample JCL for a Swap task is
//SWAP EXEC PGM=FDRPAS,REGION=0M //STEPLIB DD DISP=SHR,DSN=your.fdrpas.loadlib //SYSPRINT DD SYSOUT=* //FDRSUMM DD SYSOUT=* //SYSUDUMP DD SYSOUT=* //SYSIN DD * SWAP TYPE=FULL,MAXTASKS=3,CHECKTARGET=YES MOUNT VOL=D30001,SWAPUNIT=1100 MOUNT VOL=D30002,SWAPUNIT=1101 MOUNT VOL=D30003,SWAPUNIT=1102
Phase3 Validate the swap request
FDRPAS checks that the conditions are correct for the move on all LPARS.
Phase4 Check Monitor tasks status
The Swap task initiates this by issuing a 'swap pending' message. The monitor tasks intercept this message and reply back if they are ready to participate.
Phase5 Install IO intercept
The Swap task signals to the Monitors that the swap process has started, suspends IO and installs the IO intercept. The Monitor tasks also suspend IO to the source volume while they install an IO intercept on every LPAR.
Once the Monitors tell the Swap task that all IO intercepts are in place, the SWAP task will start to copy data to the target volume. While the copy is in progress, the monitor tasks trap IO updates and pass the list of updated tracks to the Swap task.
The SWAP task will now re-copy any tracks that were updated after the copy process started. The Monitor tasks will continue to trap IO updates and pass them to the Swap task, which repeats the process until the two disks are identical.
The Swap task and the Monitor tasks will suspend all I/O activity to the Source on their relative LPARS while the source volume is taken offline and the target varied online everywhere, then the active applications swapped over.
It should be obvious that it is vitally important that a correctly defined monitor is run on every LPAR defined to IO configuration in the sysplex. If any LPAR is missed, then IO activity could be missed and the copied data could be corrupt. FDRPAS can interrogate newer storage subsystems like the 3990-6 or 2105 to find out how many LPARs are attached to it. With older disk subsystems, you have to tell FDRPAS how many LPARS are contributing using the parameter #SYSTEMS= **WARNING** IN THIS CASE IT IS YOUR RESPONSIBILITY TO GET THE NUMBER OF SYSTEMS RIGHT.
The simulation feature of FDRPAS will display all of the systems that have access to the Source volumes specified.
//SIMSWAP EXEC PGM=FDRPAS,REGION=0M //STEPLIB DD DISP=SHR,DSN=fdrpas.loadlib //SYSPRINT DD SYSOUT=* //SYSUDUMP DD SYSOUT=* SIMSWAP TYPE=FULL,MAXTASKS=3,CHECKTARGET=YES MOUNT VOL=D30001,SWAPUNIT=1100 MOUNT VOL=D30002,SWAPUNIT=1101 MOUNT VOL=D30003,SWAPUNIT=1102
Let's start by assuming that you are either running with a newer disk subsystem,
or that you specified the correct number of systems on an older subsystem.
If you forget to start a monitor on one system, or if the monitor has
a target device coded incorrectly, FDRPAS will wait for a while to get
responses from all its monitor tasks, then issue an FDRW68 WTOR, with
the options 'reply RETRY,NO,YES'. If you get this message, try to correct
the condition that caused it (eg not starting monitors on all relevant
systems) and reply 'RETRY'. If the message is then issued again, the
vendors recommend that you contact them.
If you reply 'YES', then the Swap will proceed with no IO protection on one LPAR, and at the end of the Swap, that device will not be swapped around on that LPAR. Data corruption is almost guaranteed to happen.
An FDRPAS copy process can be terminated at any time before the final SWAP has completed, either through the ISPF panels, or with the z/OS STOP command. This can be done without affecting the original device or any applications using it.
If you stop a SWAP task then any SWAPs that are active SWAPs will be allowed to complete, but any pending SWAPs requested in that same task will not start. If you specify the CANCELPROT=-YES parameter (the default is NO), then a cancel command will be treated like a stop command and active Swaps will run to completion. This can be overridden by issuing the Cancel command twice.
If you use the LARGERSIZE=YES parameter, then FDRPAS will move the data to a larger capacity disk, for example a model 9 to a model 27. Z/OS records the free space on a disk in the indexed VTOC. If you are moving from a small disk to a larger disk then you need that free space map updated to show the extra free space. FDRPAS will do that for you automatically.
FDRPAS can send messages on Swap task completion, either by e-mail using an FDREMAIL DD statement or as TSO notifies.
FDRPAS can also work with FDRinstant to run point-in-time backups, but that is out with the scope of this page.