ESS Enterprise Storage Servers
Enterprise Storage Selection
What is an enterprise subsystem? My definition was one that supports all the major operating systems; z/OS, Unix variants, Linux variants, Windows and Netware. However it seems only 4 vendors now support z/OS mainframes and I've had a few discussions lately with people who say that using z/OS as a criteria is too restrictive. The 4 vendors that will support z/OS are EMC, IBM, HDS and HP. Oracle, a disk vendor by way of their SUN purchase, does not resell HDS disks now and so does not support z/OS anymore. NetApp is now a major player in the disk market so it seems reasonable to include them here and widen the definition of enterprise storage to include non-z/OS support.
The first section discusses the enterprise products from the six big enterprise vendors; EMC, HDS, IBM, HP, Oracle and NetApp. The second section is a table that compares some of their products.
EMC
History
EMC started out producing cache memory and developed solid state disks, memory devices that emulated spinning disks, but with much faster performance. These solid state disks were usually re-badged and sold by StorageTek.
Around 1988, EMC entered the storage market in its own name, selling symmetrix disk subsystems with what at that time was a very large, 256MB cache fronting 24GB of RAID 1 storage. Their mosaic architecture was the first to map IBM CKD mainframe disk format to standard FBA open system backend disks, and as such, could claim to be the first big user of storage virtualisation. In those days, EMC developed a reputation for delivering best performance, but at a price.
In 2008, EMC became the first to use flash storage in an enterprise subsystem, for high performance applications. EMC introduced their latest addition to the symmetrix range, the V-MAX, in April 2009.
Architecture
The old Symms used the Direct Matrix architecture, now called Enginuity. The principle behind Direct Matrix is that all IO comes into the box from the front-end directors. These are connected to global memory cache modules, which are in turn connected to back-end directors that drive the IO down to the physical disks. This connectivity is all done by a directly connected, point-to-point fibre-channel matrix.
The V-MAX architecture builds on the older DMX architecture, but has some fundamental differences. The Directors and cache are comnbined together into a V-MAX engine. Each V-MAX engine contains two directors and each director contains Host and Disk ports, a CPU complex, cache memory and a Virtual Martix interface.
The current V-MAX consists of between 1 and 8 engines. Each engine is built from commodity processors, cache, host adapters and disk adapters. This makes them relatively cheap to produce and easier to upgrade later. Internally, the engine components communicate locally, so memory access is local. However, the engines must communicate with each other and also support the Enginuity global memory concept. To achieve this, the memory is virtualised, and each engine communicates with other engines using fiber connect and RAPIDIO technology. When a director gets a memory request it then checks the location, and if it is local it is served at memory bus speeds. If it is remote, then the request is packaged up and sent off to the remote director for processing. Presumably EMC have optimised this setup to ensure that most memory accesses are local. Certainly the EMC diagrams show each engine with 2 directors, 16 host ports and 16 disk ports, but only 4 virtual matrix ports. There are two of these ports per director, and they are connected to other engines with two MIBE (Matrix Interface Boards). The Cache memory is mirrored, and in configurations with 2 or more engines, it is mirrored between engines.
This architecture extends the direct matrix principle, but now the matrix is virtual. One of the difficulties in machine hall design is leaving room for various frames to grow as cabinets are added to increase capacity. The V-MAX can now be split into 2 frames, where the system bays can be up to 25m apart.
One interesting feature is the storage tiering, based on T0 Flash storage, T1 FC drives and T2 SATA drives.
EMC FAST, or "fully automated storage tiering" checks for data usage patterns on files and moves them as required between Fibre Channel,
SAS and flash drives to optimise cost effectiveness and performance requirements. Supported subsystems include the V-Max, the Clariion CX4 and the NS unified system.
FAST can also be configured manually to move application data to higher performing disk on selected days of the month or year. This could be useful for a monthly payroll application , for example.
EMC introduced FAST2 in August 2010, which introduced true LUN tiering and can manage data at block level.
Models
The Symmetrix DMX-4 became generally available in August 2007. It is very scalable, from 240-2,400 drives. It supports 1 TB SATAII drives and 73 or 146GB flash drives. Internally, it uses 4Gb/s communications end to end, with 8 Gb/s support for FICON or Fibre Channel host connections, internal connectivity and Fibre Channel Drives. The backend architecture is FCAL.
The V-MAX starts with the single-cabinet, entry-level Symmetrix V-MAX SE that can hold 120 disks. This can be extended by adding up to 10 more frames, each holding 240 disks. The latest release can consist of two disconnected frames.
Software
DMX software includes EMC Symmetrix Management Console for defining and provisioning volumes and managing replication. The Time Finder products are used for in-subsystem and PIT replication, and SRDF for remote replication. SRDF can run in full PPRC compatibility mode, and can also replicate to three sites in a star configuration.
Enginuity 5784 adds new features including SRDF/EDP (Extended Distance Protection) which is similar to cascaded SRDF except that it uses a DLDEV (DiskLess Device) for the intermediate hop.
EMC was lacking in z/OS support for some years, but they have now licensed PAV and MA software from IBM, and have provided z/OS Storage Manager to manage mainframe volumes, datasets and replication.
Openness
In general, EMC subsystems are not Open, the exception being if they are fronted by an EMC VPLEX which allows different manufacturers devices to co-exist with EMC. Software wise, SRDF will only work between EMC devices, and even then, not with all of them. EMC Open Replicator has the ability to take PIT copies from selected non-EMC subsystems to DMX, or to copy from DMX to selected non-EMC devices.
The V-MAX is a closed virtual system, as it cannot connect to storage subsystems within the EMC range.
Full intersite connectivity is available with VPLEX.
HDS
History
Hitachi Data Systems was always known as the company that manufactured disks that were exactly compatible with IBM, but worked a little faster and cost a little less. HDS broke that mould when they introduced the 'Lightning' range of subsystems in 2000, which was a merging of telephony cross-bar technology and storage subsystem technology. They extended and developed that architecture further with the USP (Universal Storage Platform), released in September 2004. In September 2010 HDS released the Virtual Storage Platform (VSP), a purpose built subsystem that provides automated tiering between flash and spinning disk drives.
Architecture
Unlike competing storage subsystems, the VSP is not built from 'commodity' components, but uses parts designed and manufactured within HDS. HDS claims that this allows them to make a subsystem that outperforms its rivals. There is only one model of VSP, if more performance or capacity is required, you can 'scale up', 'scale out' or 'scale deep'.
The VSP is composed of 'racks', 'chassis' and 'boards'. The base model is a single rack and can be 'scaled up' by adding a second
control rack and up to four disk racks. The base rack contains one control chassis and one drive chassis. The control chassis contains a number of functional boards, and more boards can be added to the first chassis to improve performance, and another disk chassis added to increase capacity. This is called 'scaling out'. Like the USP, the VSP supports adding external disks behind a virtualisation unit, and this is called 'scaling deep'.
There are five different kinds of functional boards.
- The Front End Director boards or FEDs provide the interface to host servers and also to any external storage that may be attached to the VSP. FEDs can be either 16 port 8Gb/s FICON or 16 port Fiber Channel.
- The Back end Director boards or BEDs interface to the disk or SSD devices. Each chassis can hold two or four BED boards and BED has eight 6Gb/s SAS links, a significant departure from the USP which used F-CAL links to the disks. Two boards are normally installed and the extra two
boards are added for extra performance, as that gives 32 * 6 Gb/s SAS links in a chassis. The BEDs will connect to 128 disk Small Form Factor (SSF) disk containers or DKU and 80 disk Large Form Factor (LFF) DKUs. The SFF DKU can hold 200GB SSD disks, or 146GB, 300GB or 600GB SAS disks. The LFF DKU can hold 500GB SAS SSD or 3TB SAS disks. The maximum raw capacity with 3TB SAS is 3,769 TB.
The BEDS generate RAID parity. The RAID options are very flexible, but HDS recommends Raid5 for SSD devices and RAID10 for disks. Other RAID configurations are possible, including RAID6 P+Q. You can also buy a BED with no disks, which you can use as a virtualisation engine for external disks. - A Virtual Storage Director (VSD) is the central processor and data movement engine. Either two or four VSDs are installed in each chassis, four are used for added performance. The processors are now Intel, another change from USP technology. Each VSD board contains a quad-core Xeon CPU and 4GB of RAM. The VSDs are paired for failover purposes and they hold their meta-data and control data in shared system memory to make this possible. If one VSD fails then failover to the other one is automatic with no loss of service. When the failed VSD is repaired, failback is also automatic. The maximum number of processors is 32; 2 chassis with 4 VSDs, each with 4 processors. Control memory is not on a separate board anymore, but is held on the VSD.
- The Cache boards or DCAs (Data Cache Adapter) hold the system memory. This contains transient user IO activity and also configuration details like RAID setup, dynamic tiering status and remote copy operations. Up to 6 DCAs can be installed per chassis, and each DCA board can hold between 8GB and 32GB, giving a maximum subsystem cache size of 1TB. Each DCA board also has either one or two 32GB SSDs to allow the board to backup configuration details and any outstanding activity if the power drops. This means the VSP does not need the heavy, expensive batteries that were required to protect from power failure on the USPs. Write blocks are mirrored, but not read blocks, which means the cache utilisation is improved.
- The Grid Switch boards or GSWs are PCI express based, connected by a crossbar switch. They form a High-Star-E network with two or four GSWs in each chassis. Every GSW board has 24 1GB/s , bi-directional ports connected as follows
- 8 ports connect to FED and BED boards and transfer both data and meta-data
- 4 ports connect to VSD boards for job requests and system data transfer (like memory access requests)
- 8 ports connect to DCA boards for user data transfer and control memory updates
- 4 ports are used if an extra chassis is installed, to cross-connect to the matching GSW in the second chassis.
The switched PCI-e architecture means that internal communication in non-blocking and every input port can connect to any piece of memory and every BED port can connect to any disk. This means that data does not need to be placed behind specific ports to ensure performance.
The idea behind storage tiering is an old one - you keep your busiest data on fast, but expensive storage, then as it ages and becomes less busy you move it down the hierarchy to cheaper, slower storage. To achieve this, you had to solve two problems, first you had to run a report that identified data access profiles and use that report to work out what data was in the wrong place in this timeframe. Second, you had to move the incorrectly positioned data to the correct place in the storage hierarchy, a process that often required application downtime.
This data movement might involve whole volumes, or whole files. However in many cases files are active for part of the day and waste expensive disk space for the rest of the day. Some very large files can be have parts that are very active, and parts that are rarely accessed and moving the whole file to expensive storage is wasteful.
HDS has addressed those problems with Hitachi Dynamic Tiering (HDT). Storage inside the VSP can be either fast but expensive SSD, or slower and cheaper SAS/SATA drives. When you allocate a virtual volume on a VSP, it stripes the data over all the physical volumes in 42MB chunks or pages and that striping can go over both SSD and spinning disk. The page size is much bigger than that used by other Storage manufacturers, and HDS has used that bigger size to allow it to position parts files on different types of disk. The process is called sub-lun tiering.
Page access is checked on a regular basis, and if a page becomes 'hot' it is automatically moved up to SSD disk, while pages that have cooled down again are moved back to SAS disks. This means that active parts of files are held on high performance SSD and inactive parts on SAS disk, so optimising SSD usage. HDS claims that this effectively means that all files are on fast disk.
HDT is not just a VSP feature, it is also used on HDS NAS and Content Management storage systems.
Some other VSP features are:
Thin provisioning.
Disk space is just allocated as needed, up to size of the virtual volume. When data is deleted from the virtual volume, a Zero Page Reclaim utility returns unused storage pages returned back to spare pool.
Automatic Dynamic Rebalancing. When new physical volumes are added to the subsystem virtual volume pages are re-striped to ensure they are still evenly spread over all the physical volumes.
Universal Virtualisation Layer. If you put some external storage behind the VSP then it is carved up and allocated to look the same as the internal storage. This means that mirroring, snapshot and replication software all work consistently for both internal and external storage
Virtual Ports. Up to 1024 virtual FC ports can share the same physical port. Each attached server will only see its own virtual ports, which means they don't get to access each others data. This feature allows the VSP to efficiently use the high bandwidth that is available on an individual port.
All data stored on the VSP is hardware encrypted for security.
The USP
The older USP architecture was based on cross-bar switch connectivity, in-subsystem virtualisation, the ability to partition a subsystem into component LPARS and to ability to replicate data to externally attached subsystems.
Software
Hitachi High Availability Manager provides non-disruptive failover between
VSP and USP systems and means instant data access at remote site if
primary site goes down. This is aimed at non-mainframe SAN based applications.
Mainframe availability uses Truecopy synchronous remote mirroring and Universal replicator with full support for GDPS.
The Storage Command suite includes.
- Hitachi device manager for disk and storage configuration
- Hitachi replication manager
- Hitachi Storage Capacity Reporter for usage trending
- Hitachi tuning manager
Openness
The VSP is an open architecture, in that it works with disks from many other vendors and virtualises the data. The list of supported vendors includes EMC, HP, IBM and SUN, as well as older HDS devices. In general, the USP will support the hardware, but replaces the OEM replication software with its own.
IBM
History
The original IBM hard drive, the RAMAC 350, was manufactured in 1956, had a 24 inch (609mm) platter, and held 5 MB. The subsystem also weighed about 1 ton. That was a bit before my time, but when I joined IT 23 years ago, the storage market was dominated by IBM, the mainframe was king, and the standard disk type was the IBM 3380 model K which contained 1.89 GB. IBM lost their market leader position to EMC sometime in the 1990s.
The DSxxxx series, in late 2004 in response to competition from EMC and HDS. They updated their internal bus architecture to increase the internal transfer speed by 200% plus over the ESxxx series, and also abandoned their SSA disk architecture for a switched FC-AL standard. The DS8300 is essentially a follow-on from the ESS disk series, and re-uses much of the ESS microcode.
IBM introduced the XIV in 2008. The XIV is Open Systems only and sells alongside the DS8000 series which supports Open and Mainframe systems.
DS8000 Architecture
The DS8000 architecture effectively consists of two processor complexes called servers that are connected to hosts using host adaptors, and disks using device adaptors.
The processor complex consists of 2-way or 4-way power6 or power7 servers containing two types of cache, volatile and persistent memory. Every write IO is written to volatile memory in one processor, and non-volatile in the other before the write is acknowledged as complete. The subsystem effectively works internally as two separate units, but one server can run the whole subsystem if the other fails.
If the processor complexes hold 4 way servers then the DS8K subsystem can also be split logically into two completely independent LPARS, either as a 50/50 or a 75/25 split. Each processor complex is split into two server LPARS, and then a Storage Facility Image or SFI is built using one server LPAR from each processor complex. An SFI is sometimes called a storage LPAR, but note that this is not the same as a server LPAR.
Internal connectivity between servers and device adaptors uses RI0-G connectors, the same as is used internally in the p-series servers. These links can run at a 2 GB per second sustained bandwidth and permit the sharing of host adapters between servers. Host adaptors can be either ESCON, or 4Gb/s FICON / FC. Each Host adaptor provides 4 Host connections, but has only two internal protocol engines, so there may be a degree of blocking.
Device adaptors are installed in pairs, and include the RAID controllers. The device adaptors are connected to the disks using Switched FC-AL, but they are not in an FC-AL loop. The disks are allocated FC-AL addresses to allow the switching to work, but once the connection is made, communications are point-to-point Fibre Channel.
For more detail, try http://www.redbooks.ibm.com/redpieces/pdfs/sg246786.pdf
DS8000 Models
The top range DS8870 is a cabinet mounted subsystem that can support a maximum of 1024 disk drives and will hold a maximum of 2.3 PB raw capacity using 53TB SAS disks. The base cabinet holds 128 disk drives, up to two expansion cabinets can be added, each holding 256 disk drives. The raw disks are supplied in blocks of sixteen, but are configured in groups of eight, with each group being called an array group. All the disks in an array group must have identical size and rotation speed.
The DS8000 series currently has three models available: the DS8870, the DS8800 and the DS8700
The DS8000 extent pools can be a mixture of SSD and spinning disk, so individual LUNs and mainframe CKD volumes can have some extents on SSD and some on Disk. If you already have discrete SSD and disk pools you can merge them together to create a mixed pool.
You can move LUNs or volumes manually and non-disruptively between storage tiers, but the Easy Tier product enhances this. It moves data at 1 Gb storage stripe level rather than full volumes and the movement is policy based, depending on how active or hot a 1GB storage stripe extent is.
Manual movement is called ELMR or Entire-LUN Manual Relocation, while the automated striped based migration is called Easy stripe.
DS8000 Software
The DS software includes Flashcopy for internal subsystem point-in-time data copies, IBM Total Storage DS Manager for configuration and Metro/Global mirror for continuous inter-subsystem data replication.
The older ESS subsystems supported two kinds of z/OS Flashcopy, a basic version that just copied disks, and an advanced version that copied disks and files. DS only supports the advanced Flashcopy.
Flashcopy versions include; multi-relationship, will support up to 12 targets;
Incremental, can refresh an old Flashcopy to bring the data to a new point-in-time without needing to recopy unchanged data;
Remote Mirror Flashcopy, permits dataset flash operations to a primary mirrored disk; Inband Flashcopy commands, permits the transmission of flashcopy commands to a remote site through a Metro Mirror link;
Consistency Groups, flash a group of volumes to a consistent point-in-time. A consistency group can span multiple disk subsystems.
Remote mirroring versions include;
Metro Mirror, synchronous remote mirroring up to 300km, was PPRC; Global Copy, asynchronous remote data copy intended for data migration or backup,was PPRC-XD;
Global Mirror, asynchronous remote mirroring;
Metro/Global Mirror, three site remote replication, two sites being synchronous and the third asynchronous;
z/OS Global Mirror, z/OS host based asynchronous remote mirror, was called XRC;
Z/OS Metro/Global Mirror, three site remote replication, two sites being synchronous and quite close together, the third asynchronous and remote.
Openness
The DS subsystem series is self contained and does not interface with any other vendor's storage subsystem. For Open Systems data, IBM does support mirroring and copying to other vendor's subsystems if they are fronted with SVC virtualisation.
Futures
IBM promised a lot of enhancements to the DS series when they were first announced, and has delivered SATA drive support, space efficient Flashcopy, 4Gb FICON and virtual LUN space support so far.
The DS Subsystem LPARing is currently restricted to two LPARS both of which must be the same size or in a 75/25 split.
XIV
In early 2008 IBM bought XIV, a small storage company based in Tel Aviv. The XIV is a different type of box for IBM, and they sell it alongside their DS8000 range as an open systems solution.
XIV G3 Architecture
The XIV is based on a grid architecture of up to 15 interconnected but independent units called data modules. There is no common backplane, the modules are interconnected with Infiniband switches. Each data module contains a 2.4GHz Nehalem CPU, cache and up to 12 storage disks. Interface modules are a special type of data module and contain the above, but can also connect to external hosts through Fibre Channel and iSCSI interfaces. They also manage external mirroring and data migration tasks. Note that as there is no Ficon connectivity there is no z/OS support, which is unusual for a mainstream IBM storage unit.
As every module contains processors, all the modules share equally in processing the workload so a single module can be lost with little performance impact.
The other two types of component are the Ethernet switches and the UPS units. The redundant Ethernet switches connect the data and interface modules together so that every module can interface directly to every other module.
The XIV can be scaled out by adding new modules and scaled up by upgrading existing modules. When a new module is added, because it contains all of storage, cache and processing power, performance and bandwith capability increases in proportion.
If a new interface module is added, Ethernet and Fibre Channel interfaces are added in proportion.
The XIV can hold a maximum of 180 physical volumes, which with 3 TB drives, gives a maximum raw capacity of 540TB. The system is designed to be able to cope with losing a whole module and three disks in other modules without losing data, so it reserves the equivalent capacity of 1*12 disk module plus 3 disks for this. It also reserves another 4% of the space for Metadata, then the available space is reduced by 50% for partition copies, so the maximum effective capacity is 243 TB
Solid State drives are a later addition, but these are not used in a conventional manner. Instead of being an extra tier of disks that requires tiering software for effective use, the SSDs sit in between the DRAM cache and the spinning disks as a second level of cache. They a re primarily intended to improve random read hits.
XIV Logical Volume layout
The logical volumes as presented to the hosts are made up of 1 MB data units called partitions. These partitions are striped over all the physical disks and are also duplicated, with each copy held on different modules. The partition copies are called primary copy or secondary copy.
The mapping of logical volume partitions to physical disks and primary to secondary partitions is held in a distribution table and is carried out by the system at system startup. The distribution table is obviously a very critical component as the data would be inaccessible without it, so it is replicated over every module.
You have no control over where partitions are stored and in fact, you cannot interogate the mapping from logical volume to partition to physical volume.
The XIV calculates its space in decimal GB ( 1 decimal GB = 1,000,000,000 bytes, a 'normal' GB = 1024*1024*1024 = 1,073,741,824 bytes). This makes volume allocation a challenge as volume calculations normally use the higher value.
A logical volume is physically made up of 17 decimal GB chunks or 15.83 standard GB chunks, so it's best to define logical volume sizes as multiples of 17GB. You can define a maximum of 16,377 logical volumes including snapshots.
The data is mirrored and striped over all the disks, which can be considered a form of RAID10, but IBM say this is not really the case as the distribution follows different rules.
The 1 MB partitions are 'pseudo-randomly' spread over the disks in a way that ensures that the partition pairs never reside in the same module, the data for each volume is spread evenly over all disks, and each logically adjacent partition on a volume is distributed across a different disk.
If you add more volumes, the system creates a new goal distribution which re-balances the data distribution to make sure it is still spread evenly over all the disks. So new physical disks are quickly used and contributing to overall system performance, with no action needed from yourself
Logical volumes are 'thin provisioned', that is, the system only allocates physical space as it is required. The logical volume size is the one that is defined to the host, but the physical size is allocated in 17GB chunks as needed, until the physical size reaches the limit set by the logical size.
XIV Snapshots
Snapshots, or point-in-time copies of a volume, are fundamental to the XIV design. As the partitions that make up logical volumes are already tracked by pointers in the Distribution Table, it is very easy to create a snapshot by manipulating those pointers. Once a snapshot is created it is possible to update it, or even take another snapshot of it. Up to 16,000 snapshots can be created. Snapshots can be full refresh or differential, and it's possible to restore the original volume from a snapshot.
The XIV uses re-direct on write to manage snapshots, that is, if data is updated, the new data is written out to a new partition. With a copy-on-write snapshot, the old data must be copied over to a snapshot space before the new data can be written to disk. The proviso is that the update is going to be applied to the whole 1MB partition, otherwise the non-updated data must also be copied to the new location.
Snapshots can be made to be consistent over several logical volumes by creating consistency groups. In this case I/O activity is suspended over all the volumes in the group until all the snapshots are created.
It is possible to partition the storage into independent groups of volumes called storage pools to simplify administration. You can set a maximum storage pool size for each pool, which could be useful for setting quotas on applications or user groups. A master volume and all of its associated snapshots are always a part of only one Storage Pool.
The XIV can be configured and managed with either a GUI interface or an XCLI interface. It is also possible to use the XIV as a host to other storage subsystems. This means you can migrate data from those subsystems in-band and non-disruptively.
HP
History
In terms of mainframe disk, HP has been a Hitachi reseller for some time, but while they buy VSP hardware from Hitachi, HP works in close collaboration with HDS and supplies its own software. They have decided to stay with the smaller form factor 2.5 inch disks which lowers the maximum capacty of a P9500, but they state that this gives them better performance and power consumption numbers, and that no-one really fills a mainframe box to maximum capacity anyway. I've always viewed HP as a major Intel player, but a supplier with limited presence in the mainframe market. HP certainly uses re-badged HDS subsystems for mainframe storage where they run a managed service.
P9500 Architecture
Because the HP P9500 is a re-badged Hitachi VSP, it has the same basic architecture.
HP mainframe software includes the following products
- VLVI Manager for Mainframe - used to reduce logical device contention and I/O queue times
- Business Copy for Mainframe - used to provide local mirror copies of mainframe volumes
- Continuous Access Synchronous for Mainframe - a PPRC equivalent synchronous remote copy
- Continuous Access Journal for Mainframe - an XRC equivalent asynchronous remote copy
- Logical Volume divider for Mainframe - works with Business Copy, renames datasets and creates a user catalog to make data accessible after a split operation.
For Open Systems solutions, HP software includes
- Storageworks Continuous Access which provides synchronous data mirroring between subsystems
- Storageworks Business Copy which provides full volume copy within the subsystem. This looks similar to EMC Timefinder rather than IBM FlashCopy
- Storageworks Virtualization system, an internal and external virtualisation manager, can be used for data migration and replication
- Storageworks LUN configuration and Security manager which is used to configure the XP12000, to define paths, array groups, volumes and LUNs
- StorageWorks Performance Advisor which monitors performance within the XP subsystem
The P9500 has the same open architecture as the HDS VSP and supports the same range of OEM devices, plus it supports HP MSA devices.
HP StorageWorks Enterprise Virtual Array
Several EVA models exist. The ones discussed here are the EVA8400 and the EVA6550.
Architecture
The EVA is built around a pair of HSV (Hierarchical Storage Virtualization) controllers fronting up to 37 disk enclosures, depending in the model. Every disk enclosure is connected to both controllers so any disk can be accessed from either controller.
Each HSV controller has four times 4Gb/s FC host interfaces and contain the system cache, power supplies and enough battery power to maintain cache contents for up to 96 hours. The controllers connect up to nine drive enclosures with redundant FCAL. Larger EVA models support more FCAL loops and so can connect to more disk enclosures.
A disk enclosure can hold up to 12 drives, and the base rack can hold 18 enclosures, or 216 disk drives, which gives a maximum storage capacity of between 216 and 720 TB in a single rack. Connectivity within the enclosure is point-to-point.
The main differences between the models are cache sizes; varying from 8GB to 22GB and total disk capacity, carrying from 400GB to 720 TB. It may be possible to physically upgrade between models by swapping out controllers and reconnecting the disk enclosures, without having to migrate data between disks.
All models have same replication capability and management software and all models support up to 256 hosts with a maximum 32TB LUN size. They support up to 2048 LUNs (up to 256 per HBA) ranging in size from 1GB to 32TB per Virtual disk, in 1GB increments.
Subsystems support 300GB, 450GB and 600GB FC disk drives running at 10K or 15K rpm and 1TB FATA drives. The 6550 subsystem supports 3TB SAS drives. Some models also support 200 or 400 GB solid state drives.
EVA Software
HP StorageWorks Continuous Access EVA will replicate data between subsystems, but only supports HP StorageWorks enterprise virtual arrays. It normally runs at SAN distances, up to 20km, but can replicated data over longer distances if the HP StorageWorks IP Distance Gateway is added. This uses FCIP over the WAN.
The HP StorageWorks Business Copy incorporates Virtually Capacity-free Snapshot (Vsnaps), standard snapshots and Snapclones. Snaps can be managed through an HP Replication Solutions Manager GUI, or through a scripting interface.
It is possible to use an EVA as a tape library by adding the HP StorageWorks 12000 Virtual Library System software. This emulates several different libraries types and tape drive formats, and supports deduplication.
NetApp
History
NetApp was founded in 1992 and started out producing NetApp filers. A filer, or NAS device has a built in operating system that owns a filesystem and presents data as files and directories over the network. Contrast this with more traditional block storage approach used by IBM and EMC, where data is presented as blocks over a SAN, and the operating system on the server has to make sense of it and carve it up into filespaces.
NetApp use their own operating system to manage the filers, called Data ONTAP, which has progressively developed over the years, partly by a series of acquisitions. In June 2008 NetApp announced the Performance Acceleration Module (or PAM) to optimize the performance of workloads which carry out intensive random reads.
Data ONTAP 8.0, released at the end of 2010, introduced two major features; 64-bit support and the integration of the Spinnaker code allow clustering of NetApp filers.
According to an IDC report in 2010, at that time NetApp was the third biggest company in the network storage industry behind EMC and IBM
NetApp released the EF540 or FlashRay device in 2013. This is an all flash storage array, with obvious performance benefits. It is in beta test in early 2013, and is expected to go GA in 2014.
Architecture
Components
NetApp filers consist of Intel or AMD servers that are connected to RAM and NVRAM cache and disk shelves with PCI or PCI-e switches. The disk shelves contain FCP, SATA and SAS disk drives connected with a redundant FC loop. The NVRAM adaptor is used as a write log to boost performance and can also be used to roll the log forward after an unplanned shutdown.
Two NetApp filers can be linked together to form and active/active cluster. PCIe slots are also available to connect external tape or disk storage.
The disk shelves can contain either SATA, FC or SAS disks and are formed into RAID groups, then the RAID groups are used to form 'aggregates'. Aggregates can then be split into what NetApp calls flexible volumes, and these can be dynamically resized. It is also possible to form traditional volumes from aggregates, and while it is possible to add capacity to them, capacity cannot be removed.
File system
Data ONTAP is an operating system, and it contains a file system called Write Anywhere File Layout (WAFL) which is proprietary to NetApp. When WAFL presents data as files, it can act as either NFS or CIFS, so it can present data to both UNIX and Windows, and share that data between them.
Snapshots
Snapshots are arguably the most useful feature of Data ONTAP. It is possible to take up to 255 snapshots of a given volume. UNIX Snapshots are stored in a .snapshots directory or ~snapshots in Windows. They are normally read only, though it is possible to form writeable snapshots called Flexclones or virtual clones.
Snapshots are based at disk block level and move-after-write (?), based on inode pointers.
SnapMirror is an extension of Snapshot and is used to replicate snapshots between 2 filers. Cascading replication, that is, snapshots of snapshots, is also possible. Snapshots can be combined with SnapVault software to get full backup and recovery capability.
SyncMirror duplicates data at RAID group, aggregate or traditional volume level between two filers. This can be extended with a MetroCluster option to provide a geo-cluster or active/active cluster between two sites up to 100 km apart.
Snaplock provides WORM (Write Once Read Many) functionality for compliance purposes. Records are given a retention period, and then a volume cannot be deleted or altered until all those records have expired. A full 'Compliance' mode makes this rule absolute, and 'Enterprise' mode lets an administrator with root access override the restriction.
Models
The main NetApp models are grouped into 3 series, the 2,000, 3,000 and 6,000 series. Detailed and up to date specifications can be found on the NetApp web site, but in general terms, the difference between the models are shown below, but note these are maximum capacities in active-active dual controller configurations.
| Maximum numbers \ Model range | 2220 - 2240 | 3210 - 3270 | 6220 - 6290 |
| Number of disks | 60 - 144 | 480 - 720 | 1,200 - 1,440 |
| Maximum raw disk capacity | 180 - 432 TB | 1,920 - 2,880 TB | 5,760 - 4,800 TB |
| Number of supported LUNs | 1024 | 4096 | 4096 |
| Cache size | 6 - 12 GB | 24 - 40 TB | 96 - 192 GB |
| Flash Cache size | 1 - 2 TB | 3 TB - 16 TB |
Storage Subsystem Features table
The various suppliers of enterprise disks are contrasted in the tables below. The first row explains why the factor might be important, the second row just presents the facts, which were correct at time of writing, March 2013. However I'd advise you to check with your salesperson for up to date details.
| Vendor | IBM | EMC | HDS | HP | NetApp | |
| Device | DS8870 961 | XIV | V-MAX | VSP | P9500 | FAS6290 |
| Subsystem Architecture | ||||||
| Internal Comms Architecture | See the previous page for an explanation of the various types of comms architecture | |||||
| PCI BUS | Infiniband switch | Virtual Matrix | PCI-e | PCI-e | PCI-e | |
| Internal Bandwidth | How fast can data move inside the box? The numbers quoted are marketing figures, you won't really see these numbers in practice. See the Architecture section for more information. | |||||
| 64 Gb/s | 40 Gb/s switch ports, 8 lanes | 400 Gb/s with 8 engines | 192 Gb/s | 192 Gb/s | 40 GB/s switches | |
| External Connectivity | How many external cables can you connect to the box, and how fast do they run. | |||||
| 4/8 port 8Gb FICON or FC | 24*8Gb/s FC
22*1Gb/s iSCSI |
Up to 128*8Gb FICON
128*8Gb Fibre 64 1Gb iSCSI 64 GigE Ethernet |
112 ESCON
112 FICON 224 FC with 1024 virtual channels per physical port |
112 FICON
224 FC with 1024 virtual channels per physical port |
64/64 * 8Gb/4Gb FC
68 * GbE 72 * 6Gb SAS |
|
| Protocol Support | What kind of cables you can plug into the box. A good box will support a mixture of protocols. | |||||
| Ficon, Fibre Channel | Fibre Channel, iSCSI | Ficon, Fibre Channel, iSCSI, GB Ethernet | Ficon, Escon, Fibre Channel | Ficon, Fibre Channel | FC, FCoE, iSCSI, NFS, CIFS/SMB, HTTP, FTP | |
| Disk Connectivity | See the previous page for details of disk connectivity. | |||||
| 6Gb/s SAS to 8Gb/s Fiber backbone | SAS HBA PCIe 2.0 | FC-AL 4Gb 2 port FC | SAS | SAS | 6Gb SAS | |
| Storage Virtualisation Server | Can the storage subsystem act as a virtualisation engine in conjunction with a SAN? This enables lots of disparate storage to be controlled from one central point, including mirroring between different vendor's devices. | |||||
| No | No | No | Yes | Yes | No | |
| LPAR Capable | Can the storage subsystem be split logically, so it appears to be several separate systems, perhaps running different levels of microcode? | |||||
| No, but older models do support 2 LPARS with a 50/50 or 75/25 split. | No | No | Yes (32 LPARS, Z/OS data in single LPAR only) | Yes | No | |
| Subsystem Capacities | ||||||
| Maximum, and maximum effective capacity | How much data can you cram into the box? The maximum configured capacity will be less than the rated capacity, partly due to RAID overhead, and partly due to 3390 emulation overhead. The maximum EFFECTIVE capacity for a mainframe workload running IO intensive TP systems can be as little as 33% of the maximum capacity,if you want adequate performance. | |||||
| 450TB with 450GB FC disks 2.3PB with 3TB SAS disks (raw) |
540TB raw, 243TB usable with 3TB SAS drives | Usable Capacity depends on RAID configuration, but is up to 3.8 PB. | 3,769TB raw with 3TB disks, RAID6 usable capacities 3,267 TB; Open Systems 3,107 TB; z/OS |
,2,269 raw with 3TB disks, RAID6 usable capacity, 1,690 TB | 5,760 TB Raw | |
| Cache size | In theory, the bigger the cache, the better the performance, as you will get a better read-hit ratio, and big writes should not flood the cache. If the cache is segmented, it is more resilient, and has more data paths through it | |||||
| 16-1024 GB | 360GB | 2TB with 8 engines | 1024 GB
192 concurrent control cache operations 64 concurrent data cache operations |
1 TB | 192 GB | |
| Number of LUNs supported | ||||||
| 65,336, LUN or CKD. 2TB max size./td> | 4000, volume or snapshots | 65,280 | 65,280 | 4096 | ||
| Disk types | ||||||
| Physical disk size | How big are the real, spinning disks and how fast do they run. The bigger the disks, the less you pay for a terabyte, but bigger disks might be performance bottlenecks. If you have really large disks, then there should be fewer of them on an FC-AL loop. Faster speeds less rotational delay. | |||||
| 146, 300 GB at 15,000 rpm, and 600, 900 GB at 10.000 rpm FC; 3TB SATA |
earlier models 1TB 2TB SATA Version 3, 2TB 3TB SAS |
between 300GB and 2TB FC between 300 and 600GB SAS |
146,300,450,600 GB FC; 15,000 rpm 1TB, 2TB SATA 7,200 rpm |
146,300,450,600 GB FC; 15,000 rpm 1TB, 2TB SATA 7,200 rpm |
1TB/2TB/3TB SATA, 7.2K RPM; 450GB, 600GB SAS, 10,000 or 15,000 RPM; 600GB FC |
|
| Flash Disk support | Does the subsystem support Flash disks? They can used for data that requires very fast access | |||||
| 400GB SS | up to 6TB SSD, but used as extra cache, not a storage tier. | 100GB to 400GB | 200GB or 400GB | 800 GB | 100GB | |
| RAID levels supported | See the RAID section for details | |||||
| 5,6,10 | RAID 10 equivalent | 1,5 (3+1 or 7+1),6 (6+2 or 14+2) | 1,5,6,10 | 1,5,6,10 | 4, 6, 10 | |
| Availability features | ||||||
| remote copy | Do you mirror data between two sites? If so you need this. The remote mirroring section has more details. | |||||
| Global Mirror, asynchronous Metro Mirror (PPRC), synchronous |
XIV Remote Mirroring, synchronous or asynchronous | Synchronous(SRDF/S) and asynchronous(SRDF/A) data replication between subsystems.
SRDF/DM will migrate data between subsystems. SRDF/AR works with TimeFinder to create remote data replicas. SRDF products are all EMC to EMC SRDF can emulate Metro mirror and Global mirror |
Hitachi true copy, PPRC compatible and synchronous;
Hitachi Universal Replicator, asynchronous copy. |
Storageworks replication | SyncMirror | |
| Instant copy | 'Instant Copy' of volumes or datasets. Can be used for instant backups, or to create test data. Some implementations require a complete new disk, and so double the storage. Some implementations work on pointers, and just need a little more storage. | |||||
| Flashcopy at volume and dataset level | redirect-on-write snapshot flexible options | Timefinder at volume or dataset level. BCV version requires a complete volume be supplied, newer 'snap' version just uses pointers.
EMC Compatible Flash (FlashCopy) |
Shadow Image at volume level Copy on write snapshot |
Storageworks copy software | SnapMirror | |
| Z/OS features | ||||||
| 3380/90 emulation | 3380 drives are older legacy technology and most sites have now converted to 3390. 3390 comes in multiple sizes, a 3390-3 will hold 2.8 GB. The newest model is the 3390-M. | |||||
| All models, including EAV | N/A | All models | All models, supports up to 65,536 logical devices (the older USPs just support 16,384 Open Systems devices) | All models, supports up to 65,536 logical devices | N/A | |
| GDPS support for automated site failover | See the GDPS pages for details | |||||
| Yes | N/A | Yes, including Hyperswap | Yes | YES | N/A | |
| PAV and MA support | Parallel Access Volume and Multiple Allegiance. See the implementation tips section for details. Used to permit multi-tasking to logical devices | |||||
| Yes | N/A | Yes , including HyperPAV support | Yes | Yes , including HyperPAV support | N/A | |
| Manufacturer | IBM | EMC | HDS | HP | NetApp | |
| Device | DS8870 961 | XIV | V-MAX | VSP | P9500 | FAS6290 |
Price is usually very negotiable, but be sure to make sure that the vendor quotes for a complete solution with no hidden extras. Also, make sure that you get capped capacity upgrade prices, including increased software charges as software is usually charged by capacity tiers.