Figure 1: classic recovery avoidance through basic local mirroring with disaster recovery via remote snapshot
The storage management example in Figure 1 looks at
what might be expected of today's storage
management. The basic assumption is that a mission-critical
application suite, Application A, is running on the
processor, accessing and updating data held on storage
Device D. As the data is mission-critical it would be
reasonable to have it mirrored within D. In the (unlikely)
event of a disc in D failing and taking some of the prime
data with it, the device's own management will switch to
the mirror and rebuild the other copy of the data using a spare disc and enabling the failed unit to be replaced with
no loss of service levels.
That's fine and, apart from a very short period during
the switching and rebuilding, the data is not exposed to a
single point of failure.
To give the required degree of disaster recovery (DR)
resilience, the system takes a periodic snapshot (by
whatever means is supported on Device D) on the
remote device, R. Each instance of the remote snapshot
can overwrite the previous instance as, by definition, its
purpose is to allow recovery to the previous
checkpoint/snapshot. This is asynchronous but today's
technology means that the lag between real-time and
snapshot currency is very short.
Figure 2: protection against hardware-induced single point of failure
Now, let's suppose that D suffers a failure in one of its twinned power supplies (Figure 2). Hardware failures are
rare now, but this type of failure is probably the most
likely to occur in the field. All the data on D is now
exposed to a single point of failure - the remaining power
supply. Mirroring within D no longer gives any recovery
avoidance cover, so the mirror needs to be switched to
Device E until the offending power supply has been
replaced, when the mirroring within box D can be
reinstated. DR cover is as before.
A SAN would obviously make it simpler to switch
mirroring to another device with no loss of performance. In
an ideal world, it would be possible for Devices D and E
to come from different vendors. Apart from the latter,
trapping the power supply failure and using that to trigger
the rest of the operations is not difficult. There is nothing to
stop today's technology from doing this completely
automatically - at least within a homogeneous storage
environment - though none of the major storage vendors
would appear to take this level of function for granted, let
alone include it as part of their current offering.
See the next page, or use the left hand frame to navigate between the three timeframes.