The PowerMAX architecture builds on the older V-MAX and DMX architectures but it uses end-to-end Non-Volatile Memory Express (NVMe). NVMe is discussed in detail on the NVMe page, but in brief; NVMe is a command set based on PCIe Express, and replaces the older SCSI protocol. It allows a storage subsystem to exploit the faster speeds of not just NAND flash storage, but also the faster Storage Class Memory (SCM) technology.
A PowerMax configuration consist of PowerMax Bricks, the building blocks for a PowerMax array. Each Brick includes an engine and two Disk Array Enclosures (DAEs). The engine is the central I/O processing unit, and it contains redundant directors with multi-core CPUs and memory modules. Each Brick comes with a pair of 2.5 inch DAEs, each of which can hold up to 24 NAND or SCM drives connected by 2 NVMe ports. Bricks communicate using a Dynamic Virtual Matrix Architecture which runs over redundant internal InfiniBand fabrics which provide the high bandwidth backbone that is needed for an all flash array.
The PowerMax uses two types of Bricks:
Both PowerMax 2000 and PowerMax 8000 support 1.92 TB, 3.84 TB, 7.68 TB, and 15.36TB NVMe flash drive capacities as well as 750 GB and 1.5 TB SCM drives. All the drive sizes are 2.5” and feature a dual ported U.2 form factor PCIe interface.
The entry level PowerMax 2000 is an NVMe scale-out array which supports Open Systems only. The PowerMax 2000 accomodates up to two Bricks in half a standard 19-inch rack. It will hold up to 96 flash or SCM devices, giving a maximum usable storage capcity of 1PB, served by 64 front end ports.
Each brick can hold a 512GB, 1TB or 2TB cache, to a maximum of 4TB with 2 bricks. Cache is mirrored within the engine across the directors.
The PowerMax 8000 supports both Open Systems and Mainframes, houses up to four Bricks in a single cabinet, and up to eight Bricks maximum. Up to 288 flash or SCM devices can be installed, giving a maximum usable storage capcity is 4PB, served by 256 front end ports.
Each brick can hold 1TB or 2TB cache, to a maximum of 16TB with 2 bricks. On multi-engine PowerMax 8000 systems, the cache is mirrored across directors in different engines for added redundancy.
Individual PowerMax storage disks are split up into data device segments or physical chunks of data called TDATs. These TDATs are then combined into Storage Tiers, where the TDATS in an individual tier all have the same technology and performance characteristics. The Storage Tiers are then combined into Storage Resource Pools, or SRPs, but individual tiers can now have different performance characteristics.
There are two ways to measure the capacity of an SRP. TBu or usable capacity is the physical capacity that is available after RAID overhead and dynamic spare capacity is absorbed. When a host accesses this usable capacity, it does this from thinly provisioned front-end storage devices called TDEVs. A TDEV takes into account both thin provisioning and also data reduction tecniques like dedup and compression. So the effective capacity as seen from a host, or TBe, is much greater than the TBu physical capacity. This distinction is important when provisioning, as the TBu is used to work out the allocation of physical drives, while the TBe is used for cache sizing.
TDEVs are allocated to SRPs and assigned a Service Level. If the SRP has tiers with different performance classes, then the data is allocated across the SRP by a Automated Data Placement (ADP) utility. This ADP uses machine learning tecniques and predictive analytics to place the data on the correct storage tier that is required to meet service level response times.
PowerMax uses Smart RAID, an active/active RAID group accessing scheme which allows RAID groups to be shared across directors, giving each director active access to all drives on the Brick or zBrick.
The diagram below explains how all these components fit together.
Powermax has full support for SCM. SCM is a relatively new technology (in 2020) and is faster, but more expensive that NAND Flash storage, but not as fast, and cheaper than memory class cache storage. Unlike most cache storage, SCM is also retained after powerloss.
It is possible to configure a PowerMax system as 100% SCM, but SCM would normally be used in a tiering configuration with flash drives. Whereas in the past a storage tier system was built with spinning disk and flash storage, the new paradigm is flash for data that can handle slower performance! and SCM for data that needs fast performance. The PowerMax achieves this by intermixing SCM with traditional NAND flash drives in the DAEs. To ensure the highest levels of performance on the intermixed systems, the data on the SCM tier 0 is never compressed; however, it can be deduped. To use the SCM memory, place the data in storage groups that are assigned the 'Diamond' service level, as 'Silver' or 'Bronze' groups will always reside on NAND flash.
Dell EMC recommends that the SCM storage be between 3% and 12% of the effective system capacity. There are a few restrictions on how the SCM can be configured, for example, the SCM storage must be of the same RAID type of the NAND flash and all engines must be configured identically with respect to SCM, for I/O balance. For example if all engines must be configured with the same number and types of RAID groups.
NAND Flash devices have a limitation on the number of writes that can happen to an individual cell, before the insulating substrate breaks down. This is discussed on the Flash Storage page. The Powermax has a few features that are designed to mitigate this limitation and extend the life of the flash storage.
Write Folding detects a situation where hosts re-write data to a particular address range. Rather than continuously write this data out, it is written to cache and updated there. EMC claims this can save up to 50% of data writes.
Write Coalescing merges small random writes from different times into one large sequential write, which matches the page size within the storage drive. This allows PowerMax to convert a highly random write host I/O workload into what looks like a more efficient sequential write workload.
Advanced Wear Analytics is used to make sure writes are distributed across the entire storage tier. This balances out the load and avoids excessive writes and wear on specific drives. Advanced Wear Analytics has the added benefit of making it easy to add and rebalance additional storage into the system.
PowerMax data reduction techniques are used for Open Systems data, they do not apply to mainframe data
Data reduction processes include using 'Inline Hardware Data Compression', which EMC claims prevents compression from using too much PowerMax system core resources. 'Activity Based Compression' checks out how busy areas of data are, and compresses the least busy, while preventing any performance deterioration for busy data by not compressing it. 'Enhanced Compression' scans existing compressed data that has been not accessed for a long period of time, then tries to compress this data still further.
PowerMax employs inline hardware deduplication to identify repeated data patterns on the array and just store one instance of those repeated patterns. Both deduplication and compression are performed on the same hardware module in the PowerMax system.