When this page first appeared in about 2003, IBM mainframes probably still carried out most of the business processing, although Open Systems processing on Unix and Windows devices was catching up fast. For that reason, my original defintion of an enterprise storage system was one which supported Mainframe z/OS, UNIX variants, Netware and Windows. Now in 2019, while big financial organisations still use mainframes, Z/OS is just a small part of the picture. However, this does not mean that I think that z/Os is dead, or even dying. If your organisation uses Mainframes, then z/OS support is critical and so it still plays a major part in the selection criteria below. However, I no longer think that an enterprise system has to support z/OS. The Open systems world has changed too over the last 15 years, Netware has almost disappeared of course and LINUX support is now important.
So the current definition of an enterprise disk subsystem is one which supports the Unix variants, Linux variants and Windows, with z/OS support a bonus.
The Cloud seems to be changing everything too. It is said that no-one is building data centers anymore (except Cloud providers), and if you don't build data centers you don't need disk subsystems. Of course, your Cloud provider will be hosting your data on disks of some kind, but one of the Cloud benefits is that you don't worry about that, as long as you can store and access your data. There are two dimensions to Cloud support; if you are a cloud provider then you want your subsystem to support partitioning, so you can isolate your customers data. As an end customer, you might want your subsystems to support the Cloud as an ultimate archive tier.
Other game changers in recent years is virtualisation and flash drives. VMware support is critical for most companies, and the vendors emphasise all-Flash subsystems over Flash/Disk hybrid systems. We are moving rapidly to the state where Flash will replace spinning disk for most real time applications, relegating spinning disk to an archive storage role.
There are some new players coming onto the field and they are definitely ones to watch. Gartner ranks Pure Storage as best Flash Storage provider, and Huawei are entering the scene with their ES3000 V5 NVMe SSD disk. Non-Volatile Memory Express (NVMe) is providing a step change in performance and all the storage vendors are updating their product ranges to include NVMe as a storage tier, or even producing all-NVMe storage arrays. NVMe supports PCI Express, RDMA and Fibre Channel, and can support much higher bandwidths than SATA or SAS.
The links below will take you to discusions of the enterprise products from the five big enterprise vendors; EMC, HDS, IBM, HP and NetApp. The final link is to a table that compares some of their products.
EMC started out producing cache memory and developed solid state disks, memory devices that emulated spinning disks, but with much faster performance. These solid state disks were usually re-badged and sold by StorageTek.
Around 1988, EMC entered the storage market in its own name, selling symmetrix disk subsystems with what at that time was a very large, 256MB cache fronting 24GB of RAID 1 storage. Their mosaic architecture was the first to map IBM CKD mainframe disk format to standard FBA open system backend disks, and as such, could claim to be the first big user of storage virtualisation. In those days, EMC developed a reputation for delivering best performance, but at a price.
In 2008, EMC became the first to use flash storage in an enterprise subsystem, for high performance applications. EMC introduced their latest addition to the symmetrix range, the V-MAX, in April 2009.
In September 2016, Dell bought out EMC and the company is now called Dell EMC.
The old Symms used the Direct Matrix architecture, now called Enginuity. The principle behind Direct Matrix is that all IO comes into the box from the front-end directors. These are connected to global memory cache modules, which are in turn connected to back-end directors that drive the IO down to the physical disks. This connectivity is all done by a directly connected, point-to-point fibre-channel matrix.
The V-MAX architecture builds on the older DMX architecture, but has some fundamental differences. The Directors and cache are combined together into a V-MAX engine. Each V-MAX engine contains two controllers and each controller contains Host and Disk ports, a CPU complex, cache memory and a Virtual Martix interface.
The VMAX archtecture is described in more detail in the VMAX Architecture section
One interesting feature for hybrid systems is the storage tiering, based on Tier0 Flash storage, Tier1 FC drives and Tier2 SATA drives.
EMC FAST, or "fully automated storage tiering" checks for data usage patterns on files and moves them as required between Fibre Channel, SAS and flash drives to optimise cost effectiveness and performance requirements. Supported subsystems include the V-Max, the Clariion CX4 and the NS unified system.
FAST can also be configured manually to move application data to higher performing disk on selected days of the month or year. This could be useful for a monthly payroll application , for example.
EMC introduced FAST2 in August 2010, which introduced true LUN tiering and can manage data at block level.
The tiering concept has been extended further by adding a 'Cloud' layer, the EMC Cloud Array.
The VMAX F series are all flash, with 4 different models, the VMAX 250F, 450F, 850F and 850F. Capacities are quoted as 'effective', which assumes a 6:1 increase over usable capacity after data reduction. Storage is supplied in 'V-Bricks', which includes a VMAX engine and 53 TB of base capacity. Flash Capacity Packs let you scale up in 13 TB increments.
The 250F support up to 2 V-Bricks and has an effective capacity of 1PB
The 450F supports up to 4 V-Bricks and has an effective capacity of 2PB.
The 850F supports up to 8 V-Bricks and has an effective capacity of 4PB.
The 950F also supports up to 8 V-Bricks with an effective capacity of 4PB, but with faster engines than the 850F.
The V-MAX starts with the VMX 100K, which supports up to 2 VMAX engines, each with 128 GB cache, and 24 to 1,560 disk drives giving a usable capacity of 500 TB. The virtual matrix bandwidth is 200GB/s
The VMAX 200K supports up to 8 engines, each with 2048 GB cache, and up to 3,200 drives, but with a variety of different size disk and Flash drives in different RAID configurations, the total capacity is very much dependent on the configuration. The maximum formatted capacity with 3TB disk drives is 2.9 PB and maximum usable capacity in a RAID configuration is close to 2PB.
The VMAX 400K also supports up to 8 engines, but these are more powerful than the 20K engines. Each can support 2048 GB cache, giving a maximum cache capacity of 4 TB. It supports up to 2,400 drives with a formatted capacity close to 4PB, and a potential RAID5 or RAID6 usable capacity of 3.8PB. The main difference between the 200K and the 400K seems to be increased internal bandwith, 400GB/s compared to 192 GB/s thanks to the 384 2.7 GHz Intel® Xeon Cores, and that the 400K supports 4TB drives which is where the capacity increase comes from.
DMX software includes EMC Symmetrix Management Console for defining and provisioning volumes and managing replication. The Time Finder products are used for in-subsystem and PIT replication, and SRDF for remote replication. SRDF can run in full PPRC compatibility mode, and can also replicate to three sites in a star configuration.
Enginuity 5784 adds new features including SRDF/EDP (Extended Distance Protection) which is similar to cascaded SRDF except that it uses a DLDEV (DiskLess Device) for the intermediate hop.
EMC was lacking in z/OS support for some years, but they have now licensed PAV and MA software from IBM, and have provided z/OS Storage Manager to manage mainframe volumes, datasets and replication.
GDPS support is provided, except for GDPS/GM or a three site GDPS/MGM solution.
In general, EMC subsystems are not Open, the exception being if they are fronted by an EMC VPLEX which allows different manufacturers devices to co-exist with EMC. Software wise, SRDF will only work between EMC devices, and even then, not with all of them. EMC Open Replicator has the ability to take PIT copies from selected non-EMC subsystems to DMX, or to copy from DMX to selected non-EMC devices.
The V-MAX is a closed virtual system, as it cannot connect to storage subsystems within the EMC range.
Full intersite connectivity is available with VPLEX.
EMC VMAX3 and EMC VMAX support VMware and VMware Virtual Volumes, but look in the VMware site for the latest up to date list of VMware product names, supported devices and firmware levels.
EMC historically had an issue with supporting z/OS features like FlashCopy and PPRC mirroring, as the equivalent EMC features were introduced earlier, and were arguably (at least by EMC) better. This became a problem when GDPS came along as while Timefinder and SRDF worked fine, they did not work with GDPS. GDPS manages remote mirroring and site failover, but it does much more than just manage the storage, but also manages the failover of z/OS LPARS and applications too. A lot of big sites use it and require that any disk purchase must be 100% GDPS compatible. EMC therefore licenced some of the IBM code to ensure good compatibility.
The EMC implementation of PPRC is called Symmetrix Compatible Peer and is built on SRDF/S code. Some minor differences are:
PPRC needs Fiber Channel path definitions between each z/OS LCU. A DS8000 uses the WWN for each FC adapter to define the links, but the VMAX does not use WWNs, it uses the serial number. This means that in the GEOPLEX LINKS definition of the GDPS Geoparm, you need to specify the link protocol as 'E', then define the links with the serial number (This was how ESCON links were defined, hence EMC uses the 'E' protocol).
Symmetrix Compatible Peer does not support cascaded PPRC, PPRC loopback configurations or Open Systems FBA disks.
For GDPS FREEZE to work correctly, the GDPS / PPRC CGROUP definitions must exactly match the SRDF GROUP definitions and link definitions in the VMAX config file.
If you use Hyperswap and FAST tiering, then the FAST performance stats are copied over when a hyperswap is invoked, so the disk performance will be maintained.
GDPS requires small dedicated utility volumes on each LCU to manage the mirroring. These volumes should not be confused with EMC GDDR Gatekeeper volumes, they have completely different purposes.
The VMAX will also support XRC, which means that it will support 2 sites synchronously mirrored with PPRC, then a third site asynchronously mirrored with XRC.
Hitachi Data Systems was always known as the company that manufactured disks that were exactly compatible with IBM, but worked a little faster and cost a little less. HDS broke that mould when they introduced the 'Lightning' range of subsystems in 2000, which was a merging of telephony cross-bar technology and storage subsystem technology. They extended and developed that architecture further with the USP (Universal Storage Platform), released in September 2004.
In September 2010 HDS released the Virtual Storage Platform (VSP), a purpose built subsystem that provides automated tiering between flash and spinning disk drives. This model was augmented in late 2015 with the VSP F range, all flash systems.
In 2019, HDS has announced that they are freezing investment in their high end storage subsystems to concentrate on products with a higher profit margin, for example all Flash systems. Part of the rationale for this is that no-one actually uses the full throughput capacity of the VSP-G1500 so they see little point in developing it further at present.
Unlike competing storage subsystems, the VSP is not built from 'commodity' components, but uses parts designed and manufactured within HDS. HDS claims that this allows them to make a subsystem that outperforms its rivals.
The VSP is composed of 'racks', 'chassis' and 'boards'. The base model is a single rack and can be 'scaled up' by adding a second control rack and up to four disk racks. The base rack contains one control chassis and one drive chassis. The control chassis contains a number of functional boards, and more boards can be added to the first chassis to improve performance, and another disk chassis added to increase capacity. This is called 'scaling out'. Like the USP, the VSP supports adding external disks behind a virtualisation unit, and this is called 'scaling deep'.
There are five different kinds of functional boards.
The switched PCI-e architecture means that internal communication is non-blocking and every input port can connect to any piece of memory and every BED port can connect to any disk. This means that data does not need to be placed behind specific ports to ensure performance.
Hitachi has introduced the Hitachi Storage Virtualisation Operating System (SVOS RF) which is designed to optimise the performance of flash storage. The all flash systems below come with 2 storage module options, SSD or FMD, where FMD stands for Flash Module Device. The capacities below are shown for SSD modules, if FMD modules are installed then the capacity is reduced by half. However Hitachi builds the FMD modules in house, rather than using commodity SSD devices, and claims they give up to 3 times better random read and 5 times better random write than commodity SSD. An FMD has 32 parallel paths to the flash storage, at least twice as many as standard SSD. This means that more NAND storage can be accessed, and also channels can be dedicated to housekeeping work like garbage collection and wear levelling, and so do not interfere with host IO processing.
The All Flash models are
the F700 (256GB cache, 13 PB effective capacity)
the F900 (512GB cache, 17.3 PB effective capacity)
the F1500 (2014GB cache, 34.6 PB effective capacity)
The effective capacity figure assumes a 5 times improvement over raw capacity.
There are 6 hybrid disk / flash models; G700, G900 and G1500 with raw internal disk capacities of 11.7, 14 and 6.7 Petabytes respectively. The G1500 has a lower capacity that the G900, as it is optimised for performance.
The idea behind storage tiering is an old one - you keep your busiest data on fast, but expensive storage, then as it ages and becomes less busy you move it down the hierarchy to cheaper, slower storage. To achieve this, you had to solve two problems, first you had to run a report that identified data access profiles and use that report to work out what data was in the wrong place in this timeframe. Second, you had to move the incorrectly positioned data to the correct place in the storage hierarchy, a process that often required application downtime.
This data movement might involve whole volumes, or whole files. However in many cases files are active for part of the day and waste expensive disk space for the rest of the day. Some very large files can be have parts that are very active, and parts that are rarely accessed and moving the whole file to expensive storage is wasteful.
HDS has addressed those problems with Hitachi Dynamic Tiering (HDT). Storage inside the VSP can be either fast but expensive SSD, or slower and cheaper SAS/SATA drives. When you allocate a virtual volume on a VSP, it stripes the data over all the physical volumes in 42MB chunks or pages and that striping can go over both SSD and spinning disk. The page size is much bigger than that used by other Storage manufacturers, and HDS has used that bigger size to allow it to position parts files on different types of disk. The process is called sub-lun tiering.
Page access is checked on a regular basis, and if a page becomes 'hot' it is automatically moved up to SSD disk, while pages that have cooled down again are moved back to SAS disks. This means that active parts of files are held on high performance SSD and inactive parts on SAS disk, so optimising SSD usage. HDS claims that this effectively means that all files are on fast disk.
HDT is not just a VSP feature, it is also used on HDS NAS and Content Management storage systems.
Some other VSP features are:
Disk space is just allocated as needed, up to size of the virtual volume. When data is deleted from the virtual volume, a Zero Page Reclaim utility returns unused storage pages returned back to spare pool.
Automatic Dynamic Rebalancing. When new physical volumes are added to the subsystem virtual volume pages are re-striped to ensure they are still evenly spread over all the physical volumes.
Universal Virtualisation Layer. If you put some external storage behind the VSP then it is carved up and allocated to look the same as the internal storage. This means that mirroring, snapshot and replication software all work consistently for both internal and external storage
Virtual Ports. Up to 1024 virtual FC ports can share the same physical port. Each attached server will only see its own virtual ports, which means they don't get to access each other's data. This feature allows the VSP to efficiently use the high bandwidth that is available on an individual port.
All data stored on the VSP is hardware encrypted for security.
Hitachi High Availability Manager provides non-disruptive failover between
VSP and USP systems and means instant data access at remote site if
primary site goes down. This is aimed at non-mainframe SAN based applications.
Mainframe availability uses Truecopy synchronous remote mirroring and Universal replicator with full support for GDPS.
The Storage Command suite includes.
The VSP systems support VMware virtual volumes throught the Hitachi Storage Provider for VMware vCenter product
The VSP is an open architecture, in that it works with disks from many other vendors and virtualises the data. The list of supported vendors includes EMC, HP, IBM and SUN, as well as older HDS devices. In general, the USP will support the hardware, but replaces the OEM replication software with its own.
The original IBM hard drive, the RAMAC 350, was manufactured in 1956, had a 24 inch (609mm) platter, and held 5 MB. The subsystem also weighed about 1 ton. That was a bit before my time, but when I joined IT, the storage market was dominated by IBM, the mainframe was king, and the standard disk type was the IBM 3380 model K, a native CKD device which contained 1.89 GB. IBM lost its market leader position to EMC sometime in the 1990s. CKD or Count Key Data was based around accessing physical tracks on spinning disks. CKD is now virtualised on FBA disks.
IBM introduced the DSxxxx series in late 2004 in response to competition from EMC and HDS. They updated their internal bus architecture to increase the internal transfer speed by 200% plus over the ESxxx series, and also abandoned their SSA disk architecture for a switched FC-AL standard. The DS8800 series is essentially a follow-on from the ESS disk series, and re-uses much of the ESS microcode.
IBM introduced the XIV in 2008. The XIV is Open Systems only and sells alongside the DS8880 series which supports Open and Mainframe systems.
The DS8880F Flash, and DS8880 Hybrid storage families have greater throughput and run faster than previous models, to make use of the faster speeds from flash drives. They use POWER8 processors connected by Gen 3 I/O controllers, which can run at 3.891 GHz or 3 .535 GHz.
The POWER8 processor can parallelise workloads by running in SMT4 or SMT8 mode. As up to 8 instruction lanes can be allocated to a processor, if one lane is blocked waiting for an IO response from a database or suchlike, then the processor can continue working with instructions from one of the other lanes. This means that the processor is kept busy and can process a lot more work and so improve the IO throughput of the DS8880 subsystem. The different DS8880 models offer different CPC configurations, ranging from a 6-core processor with 64GB of memory to a 40 core processor configured with 2TB of memory.
Drive enclosures come in two types, high performance flash enclosures and standard drive enclosures. The high performance flash enclosures are connected over a PCIe G3 fabric for improved IO performance and bandwidth. The standard drive enclosures use 8Gb/s four port fiber channel adapters that are connected to 8Gb switched FC-AL, with point to point SAS connections to each drive. This means that there are 4 paths from the DS8880 processor to each drive.
There are two families in the DS8880 series, the DS8880F which is all Flash, and the DS8880 flash / spinning disk Hybrid. All models will support RAID 5,6 or 10, and can use the IBM Cloud as an archive tier. The data is encrypted at rest on the disks, and the subsystems are VMware compatible.
The DS8880F family consists of 4 models; the DS8882F, the DS8884F, the DS8886F and the DS8888F. The DS8882F is a smaller 6 core model with up to 256GBb cache, a maximum 368TB capacity and up to 16 FC or FICON channels. The DS8884F is also a 6 core model with up to 256GBb cache, a maximum 1.4PB capacity and up to 64 FC or FICON channels.
The 86F and the 88F can be configured with 24 or 48 processing cores, will support up to a 2,048 GB cache and up to 128 FC or FICON channels. The 86F can hold up to 2.95PB, while the maximum 88F capacity is 5.89PB.
The DS8000 hybrid series has only two models available: the DS8886 and the DS8884. The DS8886 can hold up to 4.6 PB raw disk capacity and 1.46 TB Flash. It can be configured with up to a maximum of 24 cores and 2TB cache. The DS8884 is a single processor model with 6 cores and up to 256GB cache.
The DS8886 base cabinet holds 128 disk drives, up to two expansion cabinets can be added, each holding 256 disk drives. The raw disks are supplied in blocks of sixteen, but are configured in groups of eight, with each group being called an array group. All the disks in an array group must have identical size and rotation speed.
The DS8886 extent pools can be a mixture of SSD and spinning disk, so individual LUNs and mainframe CKD volumes can have some extents on SSD and some on Disk. If you already have discrete SSD and disk pools you can merge them together to create a mixed pool.
You can move LUNs or volumes manually and non-disruptively between storage tiers, but the Easy Tier product enhances this. It moves data at 1 Gb storage stripe level rather than full volumes and the movement is policy based, depending on how active or hot a 1GB storage stripe extent is.
Manual movement is called ELMR or Entire-LUN Manual Relocation, while the automated striped based migration is called Easy stripe.
The DS software includes Flashcopy for internal subsystem point-in-time data copies, IBM Total Storage DS Manager for configuration and Metro/Global mirror for continuous inter-subsystem data replication.
The older ESS subsystems supported two kinds of z/OS Flashcopy, a basic version that just copied disks, and an advanced version that copied disks and files. DS only supports the advanced Flashcopy.
Flashcopy versions include;
multi-relationship, will support up to 12 targets;
Incremental, can refresh an old Flashcopy to bring the data to a new point-in-time without needing to recopy unchanged data;
Remote Mirror Flashcopy, permits dataset flash operations to a primary mirrored disk;
Inband Flashcopy commands, permits the transmission of flashcopy commands to a remote site through a Metro Mirror link;
Consistency Groups, flash a group of volumes to a consistent point-in-time. A consistency group can span multiple disk subsystems.
Remote mirroring versions include;
Metro Mirror, synchronous remote mirroring up to 300km, was PPRC;
Global Copy, asynchronous remote data copy intended for data migration or backup,was PPRC-XD;
Global Mirror, asynchronous remote mirroring;
Metro/Global Mirror, three site remote replication, two sites being synchronous and the third asynchronous;
z/OS Global Mirror, z/OS host based asynchronous remote mirror, was called XRC;
Z/OS Metro/Global Mirror, three site remote replication, two sites being synchronous and quite close together, the third asynchronous and remote.
The DS8880 supports the VMware vSphere Web Client, but not VMware virtual volumes. However this may change so consult the IBM documentation for an up to date position. (the IBM FlashSystem V9000 does support VMware virtual volumes)
The DS subsystem series is self contained and does not interface with any other vendor's storage subsystem. For Open Systems data, IBM does support mirroring and copying to other vendor's subsystems if they are fronted with SVC virtualisation.
In early 2008 IBM bought XIV, a small storage company based in Tel Aviv. The XIV is a different type of box for IBM, and they sell it alongside their DS8000 range as an open systems solution.
The XIV is based on a grid architecture of up to 15 interconnected but independent units called data modules. There is no common backplane, the modules are interconnected with Infiniband switches. Each data module contains an Intel Xeon Processor, cache and up to 12 storage disks. Interface modules are a special type of data module and contain the above, but can also connect to external hosts through Fibre Channel and iSCSI interfaces. They also manage external mirroring and data migration tasks. Note that as there is no Ficon connectivity there is no z/OS support, which is unusual for a mainstream IBM storage unit.
As every module contains processors, all the modules share equally in processing the workload so a single module can be lost with little performance impact.
The other two types of component are the Ethernet switches and the UPS units. The redundant Ethernet switches connect the data and interface modules together so that every module can interface directly to every other module.
The XIV can be scaled out by adding new modules and scaled up by upgrading existing modules. When a new module is added, because it contains all of storage, cache and processing power, performance and bandwith capability increases in proportion.
If a new interface module is added, Ethernet and Fibre Channel interfaces are added in proportion.
The XIV can hold a maximum of 180 physical volumes, which with 6 TB drives, gives a maximum raw capacity of 1080 TB. The system is designed to be able to cope with losing a whole module and three disks in other modules without losing data, so it reserves the equivalent capacity of 1*12 disk module plus 3 disks for this. It also reserves another 4% of the space for Metadata, then the available space is reduced by 50% for partition copies, so the maximum effective native capacity is 485 TB. If IBM Random Access Compression Engine technology is used, then the effective capacity can be up to 970 TB.
Solid State drives are a later addition, but these are not used in a conventional manner. Instead of being an extra tier of disks that requires tiering software for effective use, the SSDs sit in between the DRAM cache and the spinning disks as a second level of cache. They are primarily intended to improve random read hits.
The logical volumes as presented to the hosts are made up of 1 MB data units called partitions. These partitions are striped over all the physical disks and are also duplicated, with each copy held on different modules. The partition copies are called primary copy or secondary copy.
The mapping of logical volume partitions to physical disks and primary to secondary partitions is held in a distribution table and is carried out by the system at system startup. The distribution table is obviously a very critical component as the data would be inaccessible without it, so it is replicated over every module.
You have no control over where partitions are stored and in fact, you cannot interrogate the mapping from logical volume to partition to physical volume.
The XIV calculates its space in decimal GB ( 1 decimal GB = 1,000,000,000 bytes, a 'normal' GB = 1024*1024*1024 = 1,073,741,824 bytes). This makes volume allocation a challenge as volume calculations normally use the higher value.
A logical volume is physically made up of 17 decimal GB chunks or 15.83 standard GB chunks, so it's best to define logical volume sizes as multiples of 17GB. You can define a maximum of 16,377 logical volumes including snapshots.
The data is mirrored and striped over all the disks, which can be considered a form of RAID10, but IBM say this is not really the case as the distribution follows different rules.
The 1 MB partitions are 'pseudo-randomly' spread over the disks in a way that ensures that the partition pairs never reside in the same module, the data for each volume is spread evenly over all disks, and each logically adjacent partition on a volume is distributed across a different disk.
If you add more volumes, the system creates a new goal distribution which re-balances the data distribution to make sure it is still spread evenly over all the disks. So new physical disks are quickly used and contributing to overall system performance, with no action needed from yourself
Logical volumes are 'thin provisioned', that is, the system only allocates physical space as it is required. The logical volume size is the one that is defined to the host, but the physical size is allocated in 17GB chunks as needed, until the physical size reaches the limit set by the logical size.
Snapshots, or point-in-time copies of a volume, are fundamental to the XIV design. As the partitions that make up logical volumes are already tracked by pointers in the Distribution Table, it is very easy to create a snapshot by manipulating those pointers. Once a snapshot is created it is possible to update it, or even take another snapshot of it. Up to 16,000 snapshots can be created. Snapshots can be full refresh or differential, and it's possible to restore the original volume from a snapshot.
The XIV uses re-direct on write to manage snapshots, that is, if data is updated, the new data is written out to a new partition. With a copy-on-write snapshot, the old data must be copied over to a snapshot space before the new data can be written to disk. The proviso is that the update is going to be applied to the whole 1MB partition, otherwise the non-updated data must also be copied to the new location.
Snapshots can be made to be consistent over several logical volumes by creating consistency groups. In this case I/O activity is suspended over all the volumes in the group until all the snapshots are created.
It is possible to partition the storage into independent groups of volumes called storage pools to simplify administration. You can set a maximum storage pool size for each pool, which could be useful for setting quotas on applications or user groups. A master volume and all of its associated snapshots are always a part of only one Storage Pool.
The XIV can be configured and managed with either a GUI interface or an XCLI interface. It is also possible to use the XIV as a host to other storage subsystems. This means you can migrate data from those subsystems in-band and non-disruptively.
The XIV supports the VMware ESXi OS.
HP entered the disk market with the HP 7935. They also introduced the first ever commercially produced hard drive in a 1.3 inch form factor in 1992. It had a capacity of 20 MB. HP has long produced its own range of open systems disk storage, and resells a modified version of the Hitachi VSP, called the the XP7, for high end and mainframe connectivity.
Because the HP XP7 is a re-badged Hitachi VSP, it has the same basic architecture.
HP mainframe software includes the following products
For Open Systems solutions, HP software includes
The HP XP7 has the same open architecture as the HDS VSP and supports the same range of OEM devices, plus it supports HP MSA devices.
The HPE 3PAR StoreServ Storage Achitecture consists of a number of Controller Nodes, each of which contain CPU, Cache, Asics and host/disk connectivity. The Controller Nodes are interconnected by a high-speed, full-mesh backplane. Each controller node has a dedicated 4GB/s bi-directional link to each of the other nodes and a total of 56 of these links form the array’s full-mesh backplane. Also, each controller node may have one or more paths to hosts, either directly or over a SAN. As the Controller Nodes are clustered. the host servers can access volumes over any host-connected port—even if the physical storage for the data is connected to a different controller node. Controller node pairs are connected to dual-ported drive enclosures by a PCIe slot.
The StoreServ 3PAR models are:
The HPE 3PAR StoreServ 20000 Storage, designated an enterprise flash array with 9.6 PB Useable Capacity. The Cache capacity is 51.6 TB Maximum.
The HPE 3PAR StoreServ 8000 Storage, with 4PB usable capacity and a maximum Cache of 384 GB
The HPE 3PAR StoreServ 9000 Storage, Maximum capacity is 6PB usable with a maximum cache size of 896 GB.
NetApp was founded in 1992 and started out producing NetApp filers. A filer, or NAS device has a built in operating system that owns a filesystem and presents data as files and directories over the network. Contrast this with more traditional block storage approach used by IBM and EMC, where data is presented as blocks over a SAN, and the operating system on the server has to make sense of it and carve it up into filespaces.
NetApp use their own operating system to manage the filers, called Data ONTAP, which has progressively developed over the years, partly by a series of acquisitions. In June 2008 NetApp announced the Performance Acceleration Module (or PAM) to optimize the performance of workloads which carry out intensive random reads.
Data ONTAP 8.0, released at the end of 2010, introduced two major features; 64-bit support and the integration of the Spinnaker code allow clustering of NetApp filers.
According to an IDC report in 2010, at that time NetApp was the third biggest company in the network storage industry behind EMC and IBM
NetApp released the EF550 Flash array device in 2013. This is an all flash storage array, with obvious performance benefits. The current (2018) all flash array, the AFF A700s 2-node cluster, will hold 3.3PB raw, on MSW SSD drives, fronted by a 1TB cache.
Data ONTAP is an operating system, and it contains a file system called Write Anywhere File Layout (WAFL) which is proprietary to NetApp. When WAFL presents data as files, it can act as either NFS or CIFS, so it can present data to both UNIX and Windows, and share that data between them.
All Flash systems use FlashEssentials, a variant of WAFL that is optimised for Flash. It includes things like amalgamating writes to free blocks to maximise performance and increase the flash media life; a new random read I/O processing path that was designed from the ground up for flash; and inline data reduction technologies, including inline compression, inline deduplication, and inline data compaction. This means that the raw subsystem capacities quoted below can be multiplied by 4 to get the effective capacity.
Snapshots are arguably the most useful feature of Data ONTAP. It is possible to take up to 255 snapshots of a given volume and up to 255,000 per controller. UNIX Snapshots are stored in a .snapshots directory or ~snapshots in Windows. They are normally read only, though it is possible to form writeable snapshots called Flexclones or virtual clones.
Snapshots are based at disk block level and use move-after-write techniques, based on inode pointers.
SnapMirror is an extension of Snapshot and is used to replicate snapshots between 2 filers. Cascading replication, that is, snapshots of snapshots, is also possible. Snapshots can be combined with SnapVault software to get full backup and recovery capability.
SyncMirror duplicates data at RAID group, aggregate or traditional volume level between two filers. This can be extended with a MetroCluster option to provide a geo-cluster or active/active cluster between two sites up to 100 km apart.
Snaplock provides WORM (Write Once Read Many) functionality for compliance purposes. Records are given a retention period, and then a volume cannot be deleted or altered until all those records have expired. A full 'Compliance' mode makes this rule absolute, and 'Enterprise' mode lets an administrator with root access override the restriction.
The NetApp models are grouped into 3 series, All-Flash, Hybrid and Object stores. Detailed and up to date specifications can be found on the NetApp web site, but in general terms, the difference between the models are shown below. Each model uses in-line data reduction, which increase the raw capacity by a factor of 5-10. Data updates use redirect-on-write techniques and all have Cloud connectivity for data archiving. Replication can be provided using Metro Cluster (synchronous) or Snap Mirror (asynchronous) and these can be combined into a three site configuration. The all-Flash and Hybrid models come in HA pairs and more pairs can be added to form a scale out cluster. It is possible to combine all-Flash and Hybrid models in the same cluster.
|Subsystem type||Model||Max Capacity||Max Cache||Connectivity|
|All Flash||AFF A800 (12 HA pairs)||316PB||NVMe/FC, FC, iSCSI, NFS, pNFS, CIFS/SMB|
|AFF A700 (12 HA Pairs)||702PB||FC, iSCSI, NFS, pNFS, CIFS/SMB|
|AFF A200 (12 HA Pairs)||193PB||FC, iSCSI, NFS, pNFS, CIFS/SMB|
|Hybrid||FAS9000||176PB||1TB||12Gb SAS, 40GbE, 32GbFC, 10GbE|
|FAS2650||1.243PB||64GB||FC, FCoE, iSCSI, NFS, pNFS, CIFS/SMB|
This first table is a simplistic attempt to contrast some of the all-flash subsytems from the traditional vendors, and one new one. It's difficult to get meaningful comparisons yet, as some of these subsystems are targeted at different applications, so this should be considered an indication of what is available. NVMe systems have been selected where possible, and the EMC PowerMax 8000 is the only one here with FICON (and therefore Mainframe) support, but there are other all-flash mainframe systems out there.
The HP StorServ 9000 is the only SAS/SSD system on the list. HP do provide NVMe storage for servers, so if they do not have an NVMe subsystem available now, then doubtless they have one in the pipeline.
|Device||//X90||PowerMax 8000||UCP HC V124N||3Par StorSure 9000||FlashSystem 9100F||AFF A700|
|Flash Disk Types|
|Capacity||How much data can you cram into the box? Can be quoted as 'raw' capacity, 'usable' capacity once RAID overhead is calculated, and 'effective' capacity after compression. 'PiB' is multiples of 1024, PB is multiples of 1000|
|1 PB Native, 3 PB Effective||4 PB Effective||80 TB raw, 210 TB effective||6000 TiB||On a 4-way cluster;
1.8 PB raw, 1.5 PB usable, 3 PB Effective
|702.7 PB; 623.8 PiB|
|Internal Connectivity||See the previous page for details of disk connectivity.|
|External Connectivity||What kind of cables you can plug into the box. A good box will support a mixture of protocols.|
|16 Gb/s FC, 10/40 Gb/s Ethgernet (iSCSI), 1/10 Gb/s replication||16 Gb/s FC, 10 Gb/s Ethgernet (iSCSI), 16 Gb/s FICON||10 Gb/s Ethernet, 25 Gb/s Ethernet||10/32 Gb/s FC, 10 Gb/s Ethernet (iSCSI), 10 Gb/s Ethernet (FCoE)||10 Gb/s Ethernet (iSCSI), 25 Gb/s Ethernet (iSCSI, iWARP, RoCE)
16/32 Gb/s FC, FC-NVMe
|NVMe/FC, FC, FCoE, iSCSI, NFS, pNFS, SMB|
The various suppliers of hybrid flash/HDD enterprise disks are contrasted in the tables below. The first row explains why the factor might be important, the second row just presents the facts, which were correct at time of writing, April 2019. However I'd advise you to check with your salesperson for up to date details.
|Device||DS8886||XIV 2812-314||V-MAX 400K||VSP G1500||HP XP7||FAS9000|
|Internal Comms Architecture||See the previous page for an explanation of the various types of comms architecture|
|PCI-e gen3||Infiniband switch||Virtual Matrix||PCI-e||PCI-e||PCI-e|
|Internal Bandwidth||How fast can data move inside the box? The numbers quoted are marketing figures, you won't really see these numbers in practice. See the Architecture section for more information.|
|192 Gb/s per server, which gives 386 Gb/S for 2 servers||480 Gb/s||1,400 Gb/s with 8 engines||384 Gb/s||384 Gb/s||2 x 40 GB/s switches|
|External Connectivity||How many external cables can you connect to the box, and how fast do they run. Numbers quoted are maximum for each type, and if the maximum is installed then that may mean no other port types can be installed. NetApp is for 24 node NAS model.|
|4 and 8-port 8 Gbps or 4-port 16 Gbps Fibre Channel/IBM FICON to a max of 128 ports||24*8 Gb/s FC
22*10 Gb/s iSCSI
|128 x 10 Gb/s SRDF
max 256 x 8/16 GB/s combination of FC, FICON, FCoE, iSCSI
|192 x 16/32 Gb/s FC, 176 FICON, 40 x 10 Gb/s iSCSI||96 * 16/8 Gb/s Fibre Channel
192 * 8 Gb/s Fibre Channel
176 * 8 Gb FICON
192 * 10 Gb FCoE
88 * 10 Gb/s iSCSI
|12 Gb SAS, 40 GbE, 32 GbFC, 10 GbE|
|Protocol Support||What kind of cables you can plug into the box. A good box will support a mixture of protocols.|
|Ficon, Fibre Channel||Fibre Channel, iSCSI FCoE||Fibre Channel , GbE, iSCSI, FCoE, FICON, SRDF||NFS, SMB, FTP, iSCSI, HTTP to Cloud||FC, FICON, FCoE, iSCSI, HTTP to Cloud||FC, FCoE, iSCSI, NFS, pNFS, CIFS/SMB|
|Disk Connectivity||See the previous page for details of disk connectivity.|
|PCI-3 connection to an 8 Gbps FCAL backbone||SAS HBA PCIe 2.0||PCIe Gen 3 to 6Gb/s 2 port SAS drives||6Gb/sec SAS||6Gb/sec SAS||6Gb / 12Gb SAS|
|Storage Virtualisation Server||Can the storage subsystem act as a virtualisation engine in conjunction with a SAN? This enables lots of disparate storage to be controlled from one central point, including mirroring between different vendor's devices.|
|No||No||Yes, with FAST.X option||Yes||Yes||No|
|Maximum, and maximum effective capacity||How much data can you cram into the box? Can be quoted as 'raw' capacity, 'usable' capacity once RAID overhead is calculated, and 'effective' capacity after compression.|
|5.87 PB HDD SAS disks and 614 TB Flash||2 PB effective||Usable Capacity depends on RAID configuration, but is up to 4 PB.||8 PB FND
17.3 PB SSD
14 PB HDD
|34 PB raw, 29 PB usable
255 PB External Storage
|NAS; 14.7 PB per HA pair, max 176 PB with 12 pairs
SAN; 7.4 PB per HA pair, max 88 PB with 12 pairs
|Cache size||In theory, the bigger the cache, the better the performance, as you will get a better read-hit ratio, and big writes should not flood the cache. If the cache is segmented, it is more resilient, and has more data paths through it|
|2 TB||1.4 TB, plus 12 TB flash cache.||16 TB||2 TB||2 TB||1TB - 12TB with 12 HA pairs|
|Number of LUNs supported|
|65,336, LUN or CKD.
1 TB CKD max. size, 16 TB max. LUN size
|4000, volume or snapshots||64,000||65,280, 256 TB max LUN size||65,280, 256 TB max LUN size||8,192|
|Flash Disk support||How much flash capacity can be supplied|
|200,400,800,1,600 GB flash drives;
400 GB to 3.2T B high performance flash cards
|up to 12 TB SSD, but used as extra cache, not a storage tier.||3.5" SAS Drives: 800 GB, 1.6T B
2.5" SAS Drives: as above plus 960 GB, 1.92 TB
|200, 400, 800, 1,900 GB flash drives||Hybrid; 960 GB, 1.9 amd 3.8 TB flash drives||960 GB + 4 TB, 960 GB + 8 TB, 960 GB + 10 TB|
|Physical disk size||How big are the real, spinning disks and how fast do they run. The bigger the disks, the less you pay for a terabyte, but bigger disks might be performance bottlenecks. If you have really large disks, then there should be fewer of them on an FC-AL loop and avoid RAID5 as rebuild times will be too long. Faster speeds means less rotational delay.|
|300,600 GB; 1.2, 1.8, 4, 6 TB disk||4 TB or 6 TB nearline SAS||3.5" SAS Drives
10K RPM 300 GB, 600 GB
7.2K RPM 4 TB
2.5" SAS Drives:
10K RPM 300 GB, 600 GB, 1.2 TB 1.6 TB, 1.92 TB
15K RPM 300 GB
|600 GB, 1.2, 1.8, 2.4 TB faster disks
4, 6, 10TB slower disks
|10 TB SAS at 7.2K RPM||14, 6, 8, 10 TB at 7.2 RPM
900 GB, 1.2 TB, 1.8 TB at 10K RPM
|RAID levels supported||See the RAID section for details|
|5,6,10; raid5 is not supported for drives bigger than 1TB||RAID 10 equivalent||RAID 1||1+0,5,6||1,5,6||4, 6|
|remote copy||Do you mirror data between two sites? If so you need this. The remote mirroring section has more details.|
|Global Mirror, asynchronous
Metro Mirror (PPRC), synchronous
3 site MGM also supported
|XIV Remote Mirroring, synchronous or asynchronous||Synchronous(SRDF/S) and asynchronous(SRDF/A) data replication between subsystems.
SRDF/DM will migrate data between subsystems.
SRDF/AR works with TimeFinder to create remote data replicas.
SRDF products are all EMC to EMC
SRDF can emulate Metro mirror and Global mirror
|Hitachi true copy, PPRC compatible and synchronous;
Hitachi Universal Replicator, asynchronous copy.
|Storageworks replication||Metro Cluster (sync.), Snap Mirror (async.) 3 site solution possible|
|Instant copy||'Instant Copy' of volumes or datasets. Can be used for instant backups, or to create test data. Some implementations require a complete new disk, and so double the storage. Some implementations work on pointers, and just need a little more storage.|
|Flashcopy at volume and dataset level||redirect-on-write snapshot, flexible options|| Timefinder at volume or dataset level. BCV version requires a complete volume be supplied, newer 'snap' version just uses pointers.
EMC Compatible Flash (FlashCopy)
|Shadow Image at volume level
Copy on write snapshot
|Storageworks copy software||SnapMirror|
|GDPS support for automated site failover||See the GDPS pages for details|
|Yes||N/A||Yes, including Hyperswap||Yes||Yes||N/A|
|PAV and MA support||Parallel Access Volume and Multiple Allegiance. See the implementation tips section for details. Used to permit multi-tasking to logical devices|
|Yes||N/A||Yes , including HyperPAV support||Yes||Yes , including HyperPAV support||N/A|
|Device||DS8886||XIV 2812-314||V-MAX 400K||VSP G1500||HP XP7||FAS9000|
Price is usually very negotiable, but be sure to make sure that the vendor quotes for a complete solution with no hidden extras. Also, make sure that you get capped capacity upgrade prices, including increased software charges as software is usually charged by capacity tiers.
back to top