Storage Area Networks - SAN Fabrics

In basic terms, a SAN is all about connecting hardware components together with cables. Hardware components need a common interface method to allow them to talk to each other, and the most common interface is SCSI (Small Computer Systems Interface). There are three protocols that can send signals down wires, Fibre Channel, Ethernet and SCSI. Intermediate interfaces sit between these options, so that working options include; SCSI, iSCSI to Ethernet, FCIP to Ethernet, FCoE to Ethernet and Fibre channel.

SCSI is the standard I/O bus protocol which provides a common command set between devices and is discussed in detail here. As SCSI can interface directly to both hardware and cables it can be used exclusively to form a SAN. The problem with SCSI is that cable lengths are limited to about 15m, and only 15 devices can be placed on a SCSIb bus. Even though a SCSI SAN is very cheap to implement, this means that a pure SCSI SAN is not normally a practical proposition.

iSCSI or internet SCSI interfaces with TCP/IP to allow SCSI to run over an ethernet network. iSCSI SANs were proposed at the end of 2002 and matured into usable products in 2004. The biggest advantage of iSCSI is that it is much cheaper than fibre channel. It also makes clustering easier, supports multi-path IO and makes multi-site replication easier. ISCSI is still considered the optimum choice for moderate to low performance applications, though many people do use it for top class work. It is typically used for Windows, Netware or Linux, with some small Unix takeup. Application servers that are Ethernet enabled can be attached directly to an iSCSI SAN, they do not need expensive HBA cards.

Like all Fibre Channel protocols, FCIP or Fibre Channel over IP uses FCP or Fibre Channel Protocol which interfaces with SCSI and translates SCSI commands to Fibre Channel. FCIP, sometimes called fibre channel tunnelling, lets Fibre Channel run over IP networks and so use ethernet. Note the different conversion levels here; SCSI - FCP - FCIP - TCP/IP - Ethernet. FCIP is ideal for geographically dispersed SANs as it avoids the need for DWDM switches over long distances (dense wave division multiplexers) and so is cheaper. It works a bit like an ISL, so it will join 2 separate fabrics into a single fabric, unless it is combined with a router, in which case the fabrics are kept separate.

FCoE is described by the name, 'Fibre Channel over Ethernet' The difference is that it does not use the TCP/IP stack but interfaces directly with Ethernet instead. It is usually used with Converged Network Adapters (CNAs) to combine data and storage into a single network and so reduce cabling requirements. For this to work, the Ethernet standard had to be enhanced to stop frames from being lost at busy times.

Fibre Channel uses FCP to talk to SCSI, but can talk to the cables direct. It has three main variants, Point to Point, Arbitrated Loop and Switched Fabric.
Point to point is just simply a pair of fibers that connect one server to one storage device. While not a SAN, this is an improvement over SCSI as the cable distance can be much longer, but the big problem is that there is no room for expansion.
Older SAN fabrics often used FC-AL (Fibre Channel Arbitrated Loop), which was very suitable for connecting tape drives to servers, and was often implemented as a Hub, a single switch that healed itself if one port became in-operational. FC-AL is still typically used inside storage subsystems to manage the disk strings. The nodes on FC-AL share the bandwidth, so as more nodes are added, performance degrades.
Switched fabrics are expandable and perform best as each port gets the full bandwidth. This means that available network bandwidth actually increases as devices are added/ Switched Fabrics are discussed in detail below.

   

Accelerate DB2 Write with zHyperWrite and "EADM™ by Improving DB2 Logs Volumes Response Time:

Switched SAN Fabrics

The most common type of Fibre Channel SAN today is a switched fabric. Just to set the scene, here are some common terms that are used when talking about a SAN.
A 'Node' is a server or a storage device.
A 'Fabric' is the network that connects these nodes together, and includes the network switches.
A 'Domain' is a single switch within a fabric. A fabric can contain up to 239 domains, and in theory, the fabric can scale to about 16 million connections.
Switches can vary in size from small 16 port devices with little or no redundancy, to large 'directors' with hundreds of ports and no single point of failure. The principle behind designing a SAN is to optimise performance, management and scalability, within cost constraints of course.
There are five main types of fabric:

Single Switch

Single switch is the simplest fabric, but it can vary from a single 16 port switch to connect a few servers and a couple of storage devices, to a large director with hundreds of ports that connects a large enterprise together. A small switch is a single point of failure and will not scale, but can be a good way to start out.
Large switches, or directors are sophisticated pieces of kit and can have enough redundant components internally that they have no internal SPOF. In this situation, is the director then a SPOF? I've heard that one argued both ways. A large switch is also scalable, until it runs out of spare ports.


Single Switch SAN

Now a variation on a single switch is two switches (bear with me). If all your servers and storage devices are dual pathed, and you have a multi-pathing failover capability, then you can build a very resilient network with two single switch fabrics. Every device is connected to both fabrics, so if any part of one fabric goes down, all will work fine on the other fabric until the problem is fixed. This is a much better option than a single switch.
Single Switch / Dual Fabric SAN
Dual Fabric SAN

Cascade

In a Cascade SAN, the switches are simply inter-connected in a queue as shown below. There may or may not be a top level switch. An issue with this design is that you do not want to have to go through several hops to connect devices at either end of the queue, so you really need to try to localise your paths through the fabric if possible to go through no more than one ISL, and ideally, connect in the same switch. This makes a cascade design difficult to scale or change later. On the plus side, a cascase design does not need too many ISLs. The main issue with a cascade SAN is that if a switch fails, then some switches will not be able to communicate with each other. For that reason it is rarely seen these days.
Cascade Fabric SAN

Loop or Ring Fabric

A Loop fabric is essentially just a cascade fabric with the bottom switches connected together to form a ring. It is slightly harder to extend that a cascade, as you need to break to loop to install another switch. However, if one switch fails the other switches can still communicate with eachother. Otherwise it has the same drawbacks and benefits as a cascade SAN.
Loop Fabric SAN

Full Mesh

In a Full Mesh SAN, every switch is connected to every other switch with an ISL. The advantage of this approach is that you can connect any device to any open port in the fabric, and know that it can connect to any other device after just one switch hop. The big disadvantage is scalability. When you add a new switch, it must be connected to every other switch in the fabric. It is obviously not suitable for low port count switches, as then most of the ports are used for ISLs. It can be scalable and effective for big switches, where the total port count is less than 2,000.
Full Mesh SAN
Full Mesh SAN

Core/Edge

A Core/Edge SAN is a logical progression from full mesh as it does away with the requirement for lots of ISLs while preserving the one switch hop rule. It uses a high performance, highly available director for the core switch, which is connected directly to high performance servers and storage. Appliances that need lesser performance are connected to the core by slower edge switches. In some implementations, the storage devices are connected to the core and the servers to the edge switches.
Core-Edge SAN

Once you start building large SANs with lots of switches, you might not want to go to the expense of having two complete and separate fabrics for failover, but you still want redundancy. A Federate Fabric SAN contain redundant switches, so that every server is connected to two switches, and has two independent paths through the SAN to the storage. Again, the host servers must have multi-pathing software that can automatically failover if a path fails, and ideally load balance when two paths are available.
When a complex fabric is started up, it needs something to make sure that the switches are connected correctly, have unique domain ids, and are time synchronised. This can be done using fabric management software, but the Fibre Channel standard allows for a 'Principal Switch' or master switch to manage the network, and this needs to be your best performing switch.

An Edge - Core - Core -Edge is a variant of this model, often used for two site deployments. The two core switches are a large distance apart, and are usually connected by DWDM switches. The connecting links between the core switches can be called a backbone. This model can either have redundancy by having two core switches on either side of the backbone, or maybe by two separate fabrics, with servers and storage having connections to each fabric.

Core-Edge-Edge-Core SAN

SAN Fabric recommendations

  • Use core/edge or edge/core/core/edge topology as appropriate
  • Separate storage and servers by edge switches
  • Make sure core switches are highest performance
  • Use MPIO based failover
  • Build in redundancy for switches, port and paths
  • Use redundant fabrics
  • Keep redundant fabrics consistent as regards port locations to simplify maintenance
  • Take the redundancy right down to blades, asics and port groups to ensure resilience
  • Consider using ICL chassis links if your switches support them to save on ports
  • Have a minimum of 2 core switches
  • Have at least 2 trunks between every core - edge pair
  • Keep cable lengths consistent, especially for ICL connections.

Virtual fabrics can be created by partitioning physical switch ports into several logical switches. This can improve switch utilisation.

Hubs Gateways and Routers

You may still want to connect an FC-AL loop into your SAN through an FL-PORT, as some devices will only support FCAL. However rather than cable up all the devices on the loop point to point, it is better to attach them through a Hub. A Hub is a cable connecting appliance in which all devices are interconnected, but only one can use the network at a time. A switch connnects several pairs of devices together and each pair can use the network simultaneously.

Gateways provide tunnelling services and protocol conversion between different services, such as SCSI to FCP.

Routers are typically used to share devices between fabrics without merging the fabrics. If you connect two fabrics by using an ISL between two switches, then the two fabrics merge into one. Routers can be used between two local fabrics to keep them separate, or they can be combined with DWDM and dark fibers to connect two remote fabrics using FCIP.

Host Bus Adaptors

Host Bus Adaptors or HBAs are circuit boards that are installed within a server. As the name implies, they are the interface between the fiber cable and the internal server bus. HBAs can support copper or fiber, and usually have two GBIC or SFP external connections. HBAs are expensive relative to the price of a server, which in one reason why iSCSI SANs are becoming popular.

back to top