Figure 3: protection against application-induced data corruption
The next stage of evolution is shown in Figure 3.
We have agreed that hardware failures are rare. Unfortunately, applications errors and operational hiccups are not so rare, accounting for four out of every five occurrences of data non-availability between them. In this case, let's assume that our mission-critical application, Application A, is subject to a major revision. The exposure to non-availability here comes from the possibility that some data corruption will result, with the added complication that this corruption may not be discovered until some time after it has occurred. To achieve the quickest possible recovery, the system needs to be able to roll back to the last snapshot taken before the applications error caused a problem. Logically, then we need to take a series of rolling snapshots each of which does not overwrite its predecessors. So, the switch of operation from the old to new versions - from Application A Rev 1 to Rev 2 - needs to trigger the taking of a series of snapshots. There is no reason why these snapshots should not be lodged in the home device, but let's assume that it is a high-performance device and we want to conserve its capacity; we'll take rolling snapshots to a different, local device.
If roll back becomes necessary, channel management enables us to redirect production data access to Device E. Mirroring within the box needs to be continued as before, as does remote snapshotting for DR.
The same comments as above apply about the opportunity afforded by SANs and the desirability of being able to carry out storage management functions between different vendors' boxes.
To extend the scenario a little, let's assume that A Rev 2 is accompanied by a modified data format, Data Rev 2. It is perfectly reasonable to take the decision that any problem with the new version of the application will cause reversion to the previous version, so that production processing can continue. This complicates the picture further if we want to protect against the new version of the application screwing up the data.
Fortunately, there is no reason why our storage/data management regime should not be able to cope with
these refinements, so that the remote snapshot and local mirror (obviously) conform to the new data format, while the local, rolling snapshots are in the old data format, Data Rev 1. Overall, the switch to the previous version of the application can be done with minimal disruption.
This type of management is particularly apposite as applications developers come under extreme pressure to deliver new or extended function in short timescales - mirroring the commercial time-to-market imperative - with inevitable shortening of testing cycles. The advanced form is at least a year further away. Though there are still advocates of applications prototyping, run against slim workloads, scalability problems with non-mainframe platforms make this a dangerous ploy. Only running in a genuine production environment provides a valid test.
Despite the hoped-for improvements that Y2K led to in generating and managing testing environments, even the basic form of this scenario is beyond the ability of today's storage management functions. Developing the mechanisms to carry out this sort of exercise automatically will certainly take more than two years and could take as long as four.
See the next page, or use the left hand frame to navigate between the three timeframes.