Navigation Bar

Storage Virtualization

How do virtual SANs, ILM and Tiered Storage fit together? Can they deliver real benefit to your business?

Tiered storage is a new term to describe an old concept. The idea is that you have storage devices with different capacity, throughput, access speeds, connectivity and price. Expensive storage is fast and highly available. Cheap storage is slow and may be less reliable. The storage tiers will include different types of disk tape and possibly optical. This is sometimes called the Storage Continuum and was called the Storage Triangle several years ago.

ILM is a combination of storage tiers, software to manage the storage and rules to decide how the management will work as the value of data changes. This section is about the virtualization hardware that enables ILM, the ILM section describes the software but remember that they are both needed for good data management. SNIA is currently agreeing specifications for consistent ILM rules and metadata, which will integrate ILM with physical storage tiers.

Virtualization Defined

Few organisations or vendors can agree on a definition of Storage Virtualization

SNIA came up with the following; "The act of integrating one or more (back end) services or functions with additional (front end) functionality for the purpose of providing useful abstractions. Typically virtualization hides some of the back end complexity, or adds or integrates new functionality with existing back end services. "
The Enterprise Strategy Group definition Virtualization as; "a technology that gathers data location information from physical storage devices, network services and applications, and then abstracts the locations into logical views for end users".

Taking these definitions together, combining them with others, and not worrying too much about rigorous formality, Virtualization can be summarised into the following points.

  • Storage Virtualization can simplify the physical implementation of different devices by standardising them into one logical view.
  • Virtualization divorces the storage from the server and lets the server concentrate on processing application requests
  • Virtualization removes the requirement for technicians to have a good knowledge of each server platform to be able to manage and configure physical storage.
  • Virtualization can add value to a physical implementation by providing extra functionality.

The advantages of Virtualization

Common tool set

Storage software can now cost more than the hardware it supports. Each storage vendor has its own brand of software and requires a different skill set to manage it. Virtualization could permit centralised and consistent management of all volumes within a data centre using the same methods and products, no matter what the platform. This will result in a reduction in staff costs, as the process of managing and relocating the data would be standardised. Virtualization can cut the cost of replication software as it uses one set of software tools to manage all storage. If the mirroring and copying functionality moves to the virtualization server then there is no requirement for storage subsystem based products. Cost savings predict a payback inside two years.

Data placement within storage tiers.

Tiered storage can be defined as a set of storage pools having different performance and availability characteristics, the most important point being that each tier has a different cost. In the past, most sites adopted a 'one size fits all' approach, and either used expensive storage for all their data, or went for a middle ground compromise. The ultimate goal of virtualization is to consolidate multiple storage devices from different vendors into a set of storage pools or tiers. One can then place data into the correct pool, depending on the value of that data. The real trick is to recognise that the value of data changes with time, and so move the data around the tiers as its value changes. The problem with any large scale adoption of storage tiers was a combination of finding some way to connect all the tiers together, some way of moving the data between the tiers, and some way to decide which tier to use for a piece of data at a point in its lifecycle. SANs fixed the connectivity problem and Virtualization should fix the data movement issue. The third issue should be resolved by ILM. ILM will add the final building block to virtualization by providing the 'business policies' which describe what sort of management different classes of data will get. The policies will determine which tier is used to hold the the data. The policies will also determine what sort of backup that class of data will get, and if it will be moved to less expensive media if it becomes inactive. Holding the policies centrally should simplify enterprise storage management, and allow a single administrator to manage a lot more storage.

Data Replication

Data replication involves taking an instant copy of a disk, a file space, or maybe a file, The benefits of this are a combination of fast backup to eliminate the backup window and a point-in-time disk copy for fast recovery from virus attacks and rapid creation of test data. It has been possible to do this for some time with host and subsystem based Virtualization systems to disks of the same type. The Replication section has more details
Virtualization combines this function with tiered storage and enables replication from expensive disk to cheap disk. Cheap SATA disk cost a lot less than fibre-channel disk, is suitable for disk based backups, and may be adequate for test systems. The financial benefits of using cheaper storage tiers for these applications can be considerable.
Cross-site replication has always required two disk subsystems of similar type and cost. Virtualization permits cross-site replication to unlike, and potentially much cheaper devices.

Data Migration

Data migration is similar in principle to replication, except that the data is moved between devices, not copied. Because the files spaces seen by applications are virtualize then mapped to a physical implementation, that physical implementation can be changed by copying the data then adjusting the mapping pointers. If you need to free up older devices for disposal, the data can be moved off the old devices transparently with no need to stop the applications. Also, if a file space has a performance problem it can be moved to a quieter disk without any need to stop applications, a process known as 'hot file reallocation'. The benefits are reduced application downtime and fewer requirements to work unsocial hours.

Capacity management

The process of managing LUNs, volumes and files in the Open Systems area is still a major issue. If it is necessary to change a LUN size or add new servers, this requires a lot of complicated manual effort. The processes are different depending on the platform and that makes the issue worse. To make the effort more manageable, storage managers often make LUNs big enough to cope with 12 months or so of growth and that can be expensive. The physical / logical separation that virtualization introduced means you can expand, define and delete LUNs without affecting the rest of the system, and use up free space more efficiently.
Virtualization facilitates more efficient capacity management as it allows you to share physical disk capacity between different server platforms. It should mean the end of the days when you had gigabytes of free space for Windows, while your Unix systems were struggling to find space. Virtualization should also make capacity management easier, as it can automatically expand disks, file systems and databases when they hit a space threshold.
Another benefit of virtualization is that the physical storage implementation is separated from the logical view at the file servers. This should also allow you to mix and match many kinds of physical storage, from several vendors, while hiding the detail of this from the application servers.

Single Subsystem effect

It should be noted here that there is a trend towards consolidating all of your data into one large, multi-terabyte disk subsystem, or maybe two, remote mirrored subsystems. Most of these subsystems will support different classes of disks internally. If you install a single subsystem, then you get most of the above advantages without needing virtualisation.

The three types of Virtualization

While there are lots of different virtualization products around, they are all variants of three basic architectures; Host based, Controller Based and Network Based. Network based virtualization has variants and combinations of In-band and Out-band; and Switch based and Server based. Some of the various architectures are mutually exclusive and some are complementary. The following summaries may help master the confusion.

Host-based Virtualization

Host based virtualization has been around for years and involves splitting up a physical volume into virtual disks or LUNs using volume manager type software. Examples are Logical Volume Manager for UNIX, Veritas Storage Foundation for Windows or VMware. The function of the LVM is to intercept IO requests from applications, work out which physical storage subsystem they will be directed to, then translate the IOs into a format that makes sense to those physical boxes. As this Virtualization runs on the host, it will consume host CPU. Host based Virtualization may be ideal if only a single host is involved, but it can cause major problems if disks are shared between servers as the servers could access the same physical space on a disk. Every server needs to know about the other servers' virtualization maps to ensure data integrity. This issue, and the ongoing requirement to keep on top of server patches and licensing, is driving storage vendors off host based virtualization and onto switches.

Controller-based Virtualization

Application servers usually write data out to file spaces. Controller-based virtualization creates virtual images of those files spaces in the storage subsystem and maps them to pools of physical disks.
Virtualization in the controllers or storage subsystems began with cached and RAID storage controllers. Neither of these simplified storage subsystems, but they both added value; faster responses and increases resilience.
Recent developments have introduced very large storage controllers with a hundreds of terabytes of internal capacity and petabytes of external storage connection. The application servers are connected through a SAN to the virtualization controller that will host the top-tier storage directly, then third party mid and low tier disks can be attached to it. This is an in-band solution with the architecture scaled to a single subsystem managing petabytes of storage. Even though it will have lots of redundant internal components, the subsystem will be a Single Point Of Failure and it might need lots of channels for performance.
The main problem with controller based virtualization is that it is difficult to share an FC SAN between different controllers, especially from different vendors. Controller based virtualization will pretty much lock you into one vendor.

An example is the HDS Universal Storage Platform. See the USP section for details

Network-based Virtualization

If host based virtualization lies in the host and storage based virtualization lies in the subsystem it should be obvious that Network based virtualization lies in the SAN that connects the Hosts to the Storage Servers. However there are a few different implementations and to some extent these depend on how they manage the application data.

There are two types of information that passes between the hosts and the storage, data and metadata. The data is the blocks of information that make up files and records; the metadata contains information about the data, including the location of the blocks of data.

If the virtualization is 'in-band' then it lies in the data path so all the data and the metadata pass through it. The virtualization 'appliance' will create and allocate virtual volumes on the storage subsystems as required. It presents these back to the hosts, and when it receives an I/O request from an application the virtualization server will translate the IO from the file system virtual request to the physical disk IO and pass it on to the correct disk array.

If the virtualization is out-band then it will trap the metadata IO and use that to set up a path for the data IO. Once the path is defined, the appliance takes no further part in the operation so the meta-data passes through the virtualization appliance, but the data does not. The virtualization appliance will create and allocate virtual volumes on the storage subsystems, but it requires agent software in the data path to do the virtual/physical IO translation and to present virtual volume information to the operating system. If an application requests an I/O then the software agent performs the virtual/physical translation and directs the I/O request to the appropriate storage sub-system.

The virtualization code can either run on a dedicated server or 'appliance', or it can run inside the SAN switches. It might see intuitive that virtualization in the SAN switches must be in-band, but in fact the virtualization software can run on blades inside the switch and be out-band.

The IBM SAN Volume Controller (SVC) is in-band virtualization that either runs on xSeries servers, or can run as embedded software within a Cisco MDS 9000 switch. More detail can be found in the SVC section.
More detail can be found in the SVC section

IBM used to sell an out band solution called SAN File System (SFS) but they have now withdrawn this from marketing.

EMC Invista runs on an out-band server with agent software in the SAN switches. It supports replication between devices, dynamic volume movement and volume pooling. See the Invista page for more details

'Grid Storage' is yet another virtualization buzz word. Grid storage can be fitted together and expanded easily a bit like Lego blocks. Virtualization combines this modular storage with common management software. An example is HP Storage Grid, but early indications are that they will only support HP hardware. The Storage Grid page has more details

Conclusion

It appears that storage virtualization has a lot to offer in terms of simplifying the management of storage from disparate platforms and different subsystems, but maybe not so much if you run all your disk storage on one subsystem. The cost case for consolidation of storage management software also looks compelling. There is a confusing choice of virtualization solutions and it looks like it will be difficult to switch to a different solution after implementation. The key appears to be a thorough understanding of your existing storage and also a good prediction about how it will develop in future. Based on that, you should be able to draw up a list of requirements to help you to decide if virtualization is suitable for you, and then select the appropriate solution for your environment.
There is a perception in the industry that virtualization is moving towards intelligent switches or gateway devices. The future vision is that these devices will be application aware, and will be able to deliver storage based on service requirements rather than based on hardware capability. A large SAN director will be able to handle Fibre Channel, iSCSI and Ethernet concurrently, and so bring all parts of the storage continuum together. Add an intelligent blade to the director which can hold ILM based data management rules, and you have almost got storage utopia. Only time will tell if the technology will live up to this promise.

back to top


Copyright © Lascon Storage Ltd. 2000 to present date. By entering and using this site, you accept the conditions and limitations of use