Storage subsystem architectures can be split into a number of components, the Host adapters, which carry the communication channels to the outside world, the Device adapters, which communicate to the real, physical disks, the Cache, which stores data electronically to speed up performance, and the Processors, which manage all the other components. The subsystem will also need a communications architecture to connect all the parts together, and of course power systems to drive it, and cooling fans. The main components are discussed in the sections below. The important point is that the architecture should be non-blocking, as far as possible. This means that all requests can happen simultaneously, without any need to queue for resource.
There are three main connectivity architectures, and these are all discussed below. The main difference between them all is the number of parallel communications operations they can support. This in turn determines the overall subsystem bandwidth. However, be aware that when suppliers quote overall bandwidth numbers, these are maximum theoretical figures that you would never see in real life.
Traditionally, subsystem components were connected together by a bus. Only one device can talk over the bus at a time, so other devices have to wait in a queue. Storage Subsystems have several buses, control information and data are usually segregated onto separate bus structures. However, bus technology limits internal bandwidth to under 2 GB/s. Bus technology is simple and cheap, but is difficult to scale up.
The bus architecture is illustrated in the gif below. Even though there are 16 host paths into the device, only 4 concurrent IOs are possible, as the connection bus has only 4 paths. If more than 4 IOs are scheduled, subsequent IOs are blocked as shown in red, until a bus becomes free. If the device is driven within its capability then blocking is not really a problem, but the point is that a bus architecture can be overloaded.
Write IOs are shown as blue. When they reach the cache, they are effectively complete as far as the application is concerned. The staging down to disk happens in the background.
Read IOs are shown as green. Some of these are sequential IOs and so are pre-staged into cache to help application performance
Most large storage subsystems use a Switched Architecture, which could be considered as a SAN in a box. Components are attached by fibre or copper links to switches. As new components are added, more links are added to cope.
Switch architecture is usually implemented as a PCI-e configuration. The paths between the components are called 'lanes' and are full duplex. That is, they can communicate in both directions at once.
The gif below illustrates the principle behind a switched architecture. It will be non-blocking as long as the fan-out to fan-in ratio on the switches is 1:1. It is obviously simplified to make it reasonable to draw. A real switch has up to 64 connections on each side.
EMC's Direct Matrix technology uses dedicated fibre links to connect components to every other component they need to talk to. So each HBA is connected to every cache segment with a dedicated link. Every cache segment is connected to every disk adapter with a dedicated link. If new cache segments, or new disk adapters are added, then a new set of dedicated links are added. That makes the subsystem fully scaleable, without sacrificing any internal performance.
The gif below illustrates how the matrix can always cope with any data going to and from the adapters to the cache. Again, this is a very simplified diagram, the real thing has a lot more connections.
Host adapters (HA) are used to connect the external communications channels (Fibre, SAS, SATA, SCSI, ESCON, FICON) to the communications web within the storage subsystem. Typical terminology includes HBA (Host Bus Adapter) or HB (Host Bay). The Host Bay generally connects several external channels to several internal channels. For example, an HB might have 8 incoming Fiber Channel ports, which connect to 4 internal buses. In this case, it is evident that this is a blocking architecture, as the data channels are going from 2 to 1. If the architecture is n:n, then it is a good idea to check with the vendor that this means that there can be n simultaneous data operations, without any blocking.
Most enterprise disk communications is now either Switched Fabric or Point-to-Point. Some other options are
SCSI (Small Computer Systems Interface) is parallel bus based, and can support up to 15 devices. Only two devices can communicate on the bus at a time. Faster variants of SCSI have been introduced, SCSI Express, SCSI2, and Ultra2 SCSI. SCSI Express can reach up to 985 MB/s burst transfers.
FC-AL (Fibre Channel-Arbitrated Loop) can use either optical or copper connections, it can connect up to 127 devices. It almost always used now as part of a switched fabric and called Switched FC-AL. It is a loop architecture as opposed to a bus architecture like SCSI and can handle a maximum data transfer rate of 100 MB/sec. Once two devices 'own the loop' they get exclusive use. This makes the architecture 'blocking', as subsequent requests have to queue until the loop is free. FC-AL will support multiple device failures, the Port Bypass Circuit (PBC), will simply bypass the failing device, but maintain the loop integrity.
The bottom line for application performance used to be the physical disks. The cache is bottom line for most IO operations these days. As far as the application is concerned, all write IOs usually terminate in the cache now, and the final write out to disk is asynchronous. The cache is therefore very important in large disk subsystems. The cache will be a blocking point. To reduce the impact of this it should be segmented, and have several data paths though it. A typical configuration would have 4 cache segments, each with 16 data paths, allowing 64 concurrent IO operations. A segmented cache is also more resilient, as then the subsystem can survive a cache failure.
The processors are at the heart of the subsystem. These are the CPUs which contain all the microcode which describes the subsystem emulation, the RAID set-up, the channel connectivity and more. Terms used include 'cluster' and 'ACP'. You need at least 2 processors to make concurrent microcode changes, as then you can switch one processor off, while you alter it. The subsystem will run on the remaining clusters, though performance may be degraded.
back to top