I've assumed so far that PPRC will be the mirroring product of choice, if the distance between sites is less than 20km. So what if your sites are 2000km apart? And what are the alternatives to PPRC for short distance mirroring? Here's a few other options. This is not an exhaustive description of what the products do, and their command sets. Just an indication of why you might want to pick them over PPRC.
When you chose a mirroring product, one of the biggest factors to consider is what kind of DASD you already have, and, what your expect to have eventually.
PPRC & related synchronous products need to mirror like to like. PPRC
is IBM to IBM, SRDF is EMC to EMC, HRC is HDS to HDS. This kind of limits
you, as you don't really want to run more than one mirroring product,
but also, you don't want to be forced to buy from the same supplier all
the time.
As XRC is not hardware driven, it is not so fussy. The primary disks have to be IBM DASD (but not RVA), the secondarys can be anything which will accept CKD datastreams. HXRC is the same, except the primary must be HDS DASD.
XRC
XRC is an asynchronous mirroring product. Data is mirrored to a remote site,
but it will generally be a little behind the data at the primary site.
Unlike PPRC, It is software driven from an OS/390 host. It uses System
Data Mover (SDM) to copy the writes. SDM runs as an pair of OS/390 started
tasks. XRC uses a set of files to control and correctly sequence the writes.
If XRC runs over more than one CEC, a sysplex timer is used to correctly
timestamp the updates. All primary & secondary volumes must be online
to the SDM system
In brief, XRC gets involved after data is written to the primary cache (at which point the application IO is complete). The records are held in SDM datasets, where they are grouped together to maintain correct time sequence, then journalled. Then the data is written off to the secondary disks.
Its possible for the secondary writes to lag quite a bit behind the primary. If this happens, you can slow the primary IOrate down for a while to let the secondary catch up.
XRC has the ability to suspend IO temporarily for DR testing.
Dataset issues
For SYSPLEX datasets its best to mirror the LOGR & WLM datasets, as
you don't want to switch to your secondary site with no WLM policies,
and you don't want to lose your logs either. How busy is your CFRM
file? If its very busy, consider not mirroring it. Its also best if the
XCF at DR does not know about the original Coupling facilities. Its will
get upset if it thinks they are dead.
At DR, consider creating brand new CFRM and SYSPLEX couple data sets and cold starting the sysplex. SYS1.PARMLIB will be mirrored, so XCF will find the old coupling datasets in the COUPLExx member. You need procedures to decide how to reply to the WTORs which will come up at IPL time, but once you chose the new datasets, all will come up clean.
Some system datasets need to be available, but the content is not critical. Examples are Page Datasets, SMF & SYSRES. You could remote copy them to establish them at the new site, but then suspend them.
However, remember that if you're running with the XRC Datamover at your remote
site, then ALL data goes down the link, even that from suspended volumes.
Only when it gets to the DataMover is it discarded. Consider using the
XRC COPY facility on selected volumes on a regular basis, during a low
activity period, then DROP the volumes from the XRC session to save the
bandwidth
Other hints
If you do mirroring & backup to a remote site, try to keep XRC & backup links in separate fibres, otherwise your backups could saturate the links.
How do you check if mirroring is working? You can issue CQUERY TSO commands in REXX, but they echo the commands & results to the SYSLOG. Another option is to use an API to issue the PPRC and XRC requests and get the responses back. You can get them in the exact TSO message format or in a "unformatted" mode that is easier to process.
The ANTRQST macro is documented in DFSMSdfp 1.5 Advanced Services
XRC statistics are contained in SMF record 42 subtype 11
DR angle
To invoke DR (or simulated disaster, for testing), issue a XRECOVER command
to the Data Mover and in about a minute, the DASD at the recovery site
will be re-labeled, and all the data is available. You might have to recover
or recreate those few volumes which you don't mirror. Then you IPL, CLPA
and cold-start JES2.
SRDF
SRDF data replication software from EMC probably has better functionality than
PPRC, but until recently it has one major failing, its command set was
totally different. What? Well, SRDF commands only work on EMC disks. Other
vendors such as HDS & STK took the IBM PPRC command set, and interpreted
it to run their own replication software, so the underlying code is different,
but the command set is the same. This meant that you could run a disk
farm of StorageTek, IBM and HDS disks, and control all the mirroring using
one set of commands. EMC did have a half-way solution; you could run a
mainframe started task that intercepted the PPRC commands and converted
them to SRDF commands before passing them down the channel. This was far
from ideal, and prone to error. However, EMC have now joined the fold;
they will now accept native PPRC commands at the Symmetrix, and convert
them into SRDF commands in the microcode. SRDF is very similar to Timefinder,
except that SRDF works across Symms while Timefinder works within one
Symm.
SRDF mirroring modes
SRDF has 4 modes. The choice basically depends on whether you want the best possible performance, or to be absolutely sure that your data is consistent between sites.
Synchronous
In this mode, a copy of the data must be stored in cache in both local and
remote machines, before the calling application is signaled that the I/O
is complete. This means that data consistency between sites is guaranteed.
If the remote symmetrix is more that 15k away, then this can significantly
degrade performance.
When SRDF mirroring is running in SYNC mode, it is also possible to switch on the ‘domino effect’. If you then get a problem with a disk or the SRDF links so that mirroring cannot proceed, the Symm places the other disk into ‘not ready’ mode, so it cannot be accessed by the host until the problem is fixed.
Semi-synchronous
The data on a secondary logical volume can be one write I/O behind the primary,
which may sound almost as good as Synchronous, but Semi-synch will not
give you I/O consistency across volumes. The local symmetrix will return
Channel end / Device end once a write I/O is safely in the local cache,
and then it sets the logical volume to busy status, so it will not accept
any more writes. Then SRDF passes the write I/O to the remote symmetrix,
and once it is safely stored in cache there, the busy flag is removed
from the logical volume.
The advantage of Semi-synch is that the application does not have to wait for the remote I/O to complete, so performance does not suffer.
The disadvantage is that in a disaster there is no guarantee that all the I/Os that an application thinks it completed actually made it to the remote site. There could be several write I/Os queued up in the local controller (one for each logical disk) and these are processed by a FIFO queue. If an application is sending I/Os to more than one controller, there is no FIFO synchronisation between controllers so the remote data could be inconsistent.
Adaptive copy - write pending
Data is written asynchronously to the secondary device and can be up to 65535 IOs behind the primary. Data which has not been copied are called ‘dirty tracks’, and the amount of dirty tracks permissible is set by a 'skew value' parameter. If the skew value is exceeded, then the mode switches to Synchronous or Semi-synchronous until the remote symm catches up. At that point, it switches back to adaptive copy-write pending mode. Adaptive copy is useful where sites are too far apart for synchronous operation, and some data loss is acceptable.
Adaptive Copy -Disk
This mode is intended for electronically moving data between sites. There is no I/O consistency across volumes; data is simply moved without any acknowledgment
SRDF Volume terminology
Source volume the production volume that is accessed by the user, equivalent to a PPRC primary volume
Target volume the mirrored copy of a source volume, equivalent to a PPRC secondary volume
Local volume simply a non-mirrored volume. (EMC sometimes use the term ‘mirroring’ to describe RAID1 protection, which can cause confusion, as a local volume can be RAID1 mirrored)
SRDF Logical volume states
Volumes can be in three possible states.
Not ready (NR) – can’t be accessed by the host at all
Write Disabled (RO) – can be accessed by the host for read only
Write enabled (RW) – can be accessed by the host for read and write
The actual status of a volume depends on its SRDF state, and its Channel interface state. A Source volume has six different possible combinations of states, and a target volume has nine.
The desirable state for a source volume is SRDF state=RW and CI state=RW so volume state=RW
If a primary volume CI state is RW, but the SRDF state is NR, then it may be possible to access the data from the target volume, if it is in the correct state.
The desirable state for a target volume is SRDF state=RO and CI state=RW so volume state=RO
Consistency Groups
A Consistency Group is a collection of volumes in one or more symmetrix devices that need to be kept in a consistent state. If a write to a Symmetrix cannot be propagated to the Remote Site, the Symmetrix will hold the I/O for a fixed period of time. At the same time it presents a SIMM back to the host. The Congroup STC will detect the SIMM and issue the equivalent of PPRC FREEZE to all the other Symmetrix online to that Host. All Volumes in that consistency Group will then be suspended. Once they are all suspended the equivalent of PPRC RUN is issued and I/O can complete, including the first I/O that triggered the SIMM.
Consistency Group processing with SRDF does not lose data because it employs a FREEZE/RUN approach similar to PPRC FREEZE/RUN.
To create a consistency group, you use the command
Symdg create group_name –type regular
Once you create a consistency group, you can use composite SRDF commands to control all the disks in that group. For example
symrdf -g group_name failover
You can use this command to fail an entire consistency group over to the DR
site. It will Write Disable the source volumes, set the link to Not Ready
and Write Enable the target volumes
To Failback, that is restore service to your primary site, use the command
symrdf -g group_name failback
This will write disable the target (remote) disks, suspend the RDF link, merge changed disk tracks, resume the link then write enable the source disks.
While failback is in progress, you do not have a remote DR position. You can speed the failback operation up by copying invalid tracks before write disabling any disks with the command
symrdf -g group_name update
If you want to split the SRDF managed disks, that is stop mirroring and allow the disks at both sites to be updated independently, then you need the split command. This suspends the RDF link and write-enables the target disks.
symrdf -g group_name split
And once you do this, you will probably want to go back to an SRDF mirrored state again, so you need the establish command
symrdf -g group_name -full establish
This will write-disables the target disks, suspend the rdf link, Copy data from source to target then resume the rdf link.
The restore command does this the other way around. It will copy the data from the target disk back to the source. The command is
symrdf -g group_name -full restore
This write disables both source and target disks, suspends the rdf link, merges the track tables, resumes the rdf link then write enables R1
Other useful commands, which should be self explanatory are;
symrdf -g group_name suspend
symrdf -g group_name resume
symrdf -g group_name set mode sync
symrdf -g group_name set domino on
symrdf -g group_name set acp_disk skew 1000
HXRC, HRC & HARC
HXRC (Hitachi eXtended Remote Copy) is fully compatible with XRC. The7700 or 7700E can be the source, and anything (IBM, EMC, STK or HDS) can be the Remote volume.
HRC (Hitachi Remote Copy) is fully compatible with PPRC. It works between 7700 and 7700E. To gain write throughput with HRC you may need to specify multiple consistency groups. This can be become complex to manage.
HARC (Hitachi Asynch Remote Copy) (=XRC without a server) must be 7700E and/or 9900s.
TDMF
It is possible to mirror data using TDMF (Transparent Data Migration Facility
from Amdahl). It is a software solution and is completely in dependant
of the disks, either primary or secondary.
Inter-SDS
Inter-SDS is part of DataCore's SANsymphony suite. It will work at distances up to 100 km with fibre channel and optical extenders. Second site includes a snapshot copy called a Rapid Recovery Volume (RRV) for instant backup. Both sites need a DataCore Storage Domain Server to control the mirroring, and a Fibre Channel switch. The big advantage of Inter-SDS, is that it supports most disk storage devices.
The RRVs are dual purpose. The idea is to flush the cache at the primary site on a regular basis (daily?), with the applications stopped, then create a set of RRVs. The secondarys protect from physical corruption, and the RRVs protect from logical corruption, with some data loss, but minimal restore time. The RRVs are also available for tape backup, so minimising application downtime to a few seconds.
DataCore also offer an asynchronous IP Mirroring product, AIM. AIM requires
source & target SDS boxes at both sites, and uses standard TCP/IP. This
means it can run over any existing IP network. Obviously, if network
bandwidth is restricted, then the target data can lag a long way behind
the source. Write IOs will always be synchronized on a single secondary
volume, so although the data will lag behind the primary, it will be
consistent. Write order is not maintained over multiple volumes. AIM
can mirror to several target volumes at once.
Bi-Directional Data Replication Manager (DRM)
Part of Compaq's SANworks suite, DRM works in synchronous or asynchronous mode,
and supports Compaq disk devices. It can mirror in both directions,
so two production sites can be used to mirror each other. If the copy
link is broken, mirroring is automatically synchronized when the link
is recovered. DRM has a snapshot capability for creating point in time
copies of data for offline backups.