VPLEX is EMC's SAN virtualisation manager. It is deployed as a cluster of engines, though it is possible to have a single engine cluster. The architecture is in-band which means that all IO flows through the VPLEX. The cluster itself consists of between one and four pairs of directors, and any director can fail over to any other director in the cluster if any component fails. The back end storage is open, that is, it can be EMC and non-EMC.
The engines contain IO directors and a Linux kernel that runs proprietary cluster software. The VPLEX components use the same standard building blocks that EMC uses in its Symmetrix products. The two IO directors in the cluster run on a virtualised x86 processing unit and have 16 or 32 Fibre Channel ports, split between front end and back end.
A storage volume is simply a physical storage LUN that exists in one of the 'back end' storage controllers. The VPLEX carves this volume up into extents, then combines these extents with extents from other storage volumes to create the virtual volumes that are presented to the servers.
So, externally, the 'front end' of the VPLEX looks like a storage controller, as far as the connecting servers are concerned. The 'back end' is connected to actual storage controllers and from them it looks like a host.
A V-Plex can migrate data between different devices or between different extents, without affecting any running user applications.
Write caching works differently depending on the VPLEX type. With vplex local and metro, while writes are cached in the VPLEX, a write will not be marked as complete until it is successfully written to both back end storage arrays. Because VPLEX Geo can potentially have elongated response times from the remote cluster, a write is acknowledged completed when it is cached.
Three types of VPLEX configuration exist.
For configuration details see the VPLEX configuration page
The VPLEX Metro system contain two clusters, with each cluster having one, two, or four engines, although the two clusters need not have the same number of engines. The two clusters can be in the same datacenter or in two datacenters, as long as they are not too far apart. The important thing is that the RTL communication latency needs to be in the order of 5 -10ms.
If you run with both clusters in the same datacenter then you get a high availability, active-active local solution that should survive the loss of one cluster. If you span between datacenters then you stretch that high availability and your applications should survive the loss of one datacenter.
While the VPLEX data disks are spread inside a datacenter or even between datacenters, you want your hosts to think they are accessing data from a traditional disk. This is achieved by the VPLEX cache memory, where the memory in each director is combined to form a distributed VPLEX cache. Every director within a cluster can service an I/O request for any virtual volume served by that cluster. Every director within a cluster is connected to the same set of physical storage volumes from the back-end arrays and has the same virtual-to-physical storage mapping metadata for its volumes. However the cache algoriths are designed so that each director accesses local disks wherever possible, so minimizing inter-director messaging. The global cache also ensures that a director always accesses the most recent consistent data for given volume.
VPLEX Metro uses the write-through cache mode, which means that an application must wait until data is written to both cache and disk storage before it is considered complete. This has a performance impact but is much safer than accepting the writing to cache as completion then writing the data to disk later, and it guarantees that both local and remote copies of data are consistent.
One of the issues facing a active-active cluster design is what to do if one side of the cluster fails, or if the comms links fail. VLPEX Storage disks can be aggregated together into consistency groups, so that the I/O activity will be consistent within that group, even after a failure. The cluster sides are marked as 'preferred' and 'non-preferred', so if the non-preferred site goes down, then processing will continue at the preferred site. However if it's not possible to find out which site has gone down, usually because the links have failed, then processing should stop at both sites. If it does not stop, then both sites will continue to update local data independently and data corruption could well happen.
VPLEX Metro resolves this problem with VPLEX WITNESS, a third server running on a separate VM, which is connected to each cluster side independently. This VM needs to be hosted on a different hypervisor to any cluster volume, and configured in such a way that it would not be affected by any cluster failure. We still designate one cluster as the preferred cluster for data availability and now if the comms links between the VPLEX clusters fail, but the connections to the VPLEX Witness are retained, the VPLEX Witness will indicate to the clusters that the preferred cluster should continue providing service to the volumes in the consistency group. However the non-preferred cluster will stop servicing the volumes to ensure data consistency and this will continue until the link issue is resolved and the data is correctly mirrored again and consistent in both sites. So now we have three failure scenarios and three safe actions to resolve them, provided the volumes are in a consistency group.
VPLEX Virtual Edition is implemented on a VMware ESXi infrastructure as a VMware Vapp. It extends the distance that VMware can operate in, to provide failover and resilience between data centers. Tools like VMware Vmotion can be used to move VMs transparently between data centers.
back to top