Mirroring data to a remote site is a good starting point for coping with a disaster, but much more is needed than just simple mirroring. How do you detect that you are getting problems at your primary site? How do you suspend mirroring if problems occur, so your remote site data is in a consistent state? What about switching applications and services? To some extent, this can be managed with 'stretched' clusters, for example a Windows server cluster with some of the servers in a remote site, but this does not cater for managing services that run on different platforms.
GDPS is about Business Resilience, the ability of IT systems to quickly recover from a disaster, or to be able to switch services quickly between datacenters to reduce downtime incurred by planned outages. GDPS should provide almost continuous application availability. The Recovery Time Objective (RTO), the time it takes to recover services is very short for GDPS, say between 1 to 2 hours. The Recovery Point Objective (RPO), the data loss time ranges between zero and a few minutes, depending on the GDPS configuration used. GDPS simplifies the management of ddata mirroring as you cna process whole groups of disks, or systems with a single command.
GDPS was originally developed for IBM mainframes using PPRC or Metro Mirror between two sites. However mainframe data is a relatively small part of the picture now, compared to 10 years ago, and applications run on different platforms. There is also a requirements for 3-site solutions, where two datacenters run at Metropolitan distances, but a third, very remote site exists for extreme DR. GDPS now comes in the following flavours:
GDPS supports non-IBM storage devices, as long as they are compatible with PPRC.
GDPS consists of a number of components, the base component, plus RCMF and PSMF.
RCMF (Remote Copy Management Facility) was ISPF panel driven, but now also has a GUI interface which makes it easier to use. RCMF simplifies the management of PPRC disks, as we can manage the whole configuration with single commands, instead of working with one volume at a time.
PSMF (Parallel Sysplex Management Facility), is a panel driven system to allow you to swap between parallel sysplex configurations.
GDPS makes it easy to control site management. It goes way beyond what a Storage person would normally do, and it automates complete site switching using simple panel options. Two main scenarios are offered -
A planned outage, where GDPS will
The recovery site can then be "IPLed," and the network switched to make the applications available from the recovery site.
GDPS principles are illustrated in the GIF below
If your production site goes belly up, GDPS will freeze the remote copy to maintain data integrity. It is crucial that the data used after a disaster is consistent. If parts of databases are out of step with each other, then they will have to be recovered from backups and logs. This takes a long time. GDPS will freeze all data on all storage systems at the same point, so a GDPS recovery should typically take 30-60 minutes.
The disks will be swapped to the subsystems in the secondary site, and, from this point, the recovery will continue as described in the planned reconfiguration.
A freeze command will be issued, if GDPS detects any hardware errors in the system. Exactly what happens next will depend on what you ask GDPS to do. You can ask for -
GDPS now has a set of z/OS dialogs which are used to create GDPS policy options. One of these is the GEOPLEX OPTIONS, which controls various functions within GDPS, including whether or not HyperSwap can be invoked automatically, and if tape errors can cause a DASD freeze.