In basic terms, a SAN is all about connecting hardware components together with cables. Hardware components need a common interface method to allow them to talk to each other, and the most common interface is SCSI (Small Computer Systems Interface). There are three protocols that can send signals down wires, Fibre Channel, Ethernet and SCSI. Intermediate interfaces sit between these options, so that working options include; SCSI, iSCSI to Ethernet, FCIP to Ethernet, FCoE to Ethernet and Fibre channel.
Non Volatile Memory Express, or NVMe, is an open standards protocol specifically designed for communications between between servers and Flash / SCM Storage. NVMe-oF, or NVMe over Fiber, allows NVMe to work over a Fiber Channel SAN network. NVMe is fully discussed in the NVMe page
SCSI is the standard I/O bus protocol which provides a common command set between devices and is discussed in detail here. As SCSI can interface directly to both hardware and cables it can be used exclusively to form a SAN. The problem with SCSI is that cable lengths are limited to about 15m, and only 15 devices can be placed on a SCSIb bus. Even though a SCSI SAN is very cheap to implement, this means that a pure SCSI SAN is not normally a practical proposition.
iSCSI or internet SCSI interfaces with TCP/IP to allow SCSI to run over an ethernet network. iSCSI SANs were proposed at the end of 2002 and matured into usable products in 2004. The biggest advantage of iSCSI is that it is much cheaper than fibre channel. It also makes clustering easier, supports multi-path IO and makes multi-site replication easier. ISCSI is still considered the optimum choice for moderate to low performance applications, though many people do use it for top class work. It is typically used for Windows, Netware or Linux, with some small Unix takeup. Application servers that are Ethernet enabled can be attached directly to an iSCSI SAN, they do not need expensive HBA cards.
Like all Fibre Channel protocols, FCIP or Fibre Channel over IP uses FCP or Fibre Channel Protocol which interfaces with SCSI and translates SCSI commands to Fibre Channel. FCIP, sometimes called fibre channel tunnelling, lets Fibre Channel run over IP networks and so use ethernet. Note the different conversion levels here; SCSI - FCP - FCIP - TCP/IP - Ethernet. FCIP is ideal for geographically dispersed SANs as it avoids the need for DWDM switches over long distances (dense wave division multiplexers) and so is cheaper. It works a bit like an ISL, so it will join 2 separate fabrics into a single fabric, unless it is combined with a router, in which case the fabrics are kept separate.
FCoE is described by the name, 'Fibre Channel over Ethernet' The difference is that it does not use the TCP/IP stack but interfaces directly with Ethernet instead. It is usually used with Converged Network Adapters (CNAs) to combine data and storage into a single network and so reduce cabling requirements. For this to work, the Ethernet standard had to be enhanced to stop frames from being lost at busy times.
Fibre Channel uses FCP to talk to SCSI, but can talk to the cables direct. It has three main variants, Point to Point, Arbitrated Loop and Switched Fabric.
Point to point is just simply a pair of fibers that connect one server to one storage device. While not a SAN, this is an improvement over SCSI as the cable distance can be much longer, but the big problem is that there is no room for expansion.
Older SAN fabrics often used FC-AL (Fibre Channel Arbitrated Loop), which was very suitable for connecting tape drives to servers, and was often implemented as a Hub, a single switch that healed itself if one port became in-operational. FC-AL is still typically used inside storage subsystems to manage the disk strings. The nodes on FC-AL share the bandwidth, so as more nodes are added, performance degrades.
Switched fabrics are expandable and perform best as each port gets the full bandwidth. This means that available network bandwidth actually increases as devices are added/ Switched Fabrics are discussed in detail below.
A Hub is used in a LAN rather that a SAN, but is mentioned here for completeness. A Hub is a cable connecting appliance in which all devices are interconnected, but only one can use the network at a time. A Hub would not normally be used for devices that are attached to the internet, as when it receives a data packet, it broadcasts it out to all the other devices attached to the hub, regardless of which one ends up being the final destination. The network bandwidth is split between all of the connected devices, so if several devices are active, then the banwidth is shared between them, which will slow performance down.
A switch connnects several pairs of devices together and each pair can use the network simultaneously. The difference between a hub and a switch is that when a switch receives a packet of data, rather than broadcasting it out to all the devices attached to the switch, it works out which device the packet is addressed to and sends it to that device only. This makes the network more secure, more efficient and faster, which is why switches are a better bet than a hub. Modern switched must be NVMEoF capable to meet the performance requirements of today's networks.
Hubs and switched connect devices, Routers connect networks. The function of a router is to route data packets to other networks, instead of just the local computers. A router is a type of Gateway, see below. A common use of a router is to connect a business (or a home) to the Internet.
SAN Routers are typically used to share devices between fabrics without merging the fabrics. If you connect two fabrics by using an ISL between two switches, then the two fabrics merge into one. Routers can be used between two local fabrics to keep them separate, or they can be combined with DWDM and dark fiber to connect two remote fabrics using FCIP.
In SAN routing, connection between ports is terminated at each SAN island. This ensures that transactions can be carried out without the danger that problems at any single SAN island will spread to others. The probability of a major network disruption is thereby minimized. Problems can be isolated and resolved quickly. SAN routing also eliminates confusion if addresses overlap between SAN islands in the context of a larger network.
Different Networks may use different protocols, a bit like speaking different languages, to communicate betwen the devices on that network. A Gateway will translate between those protocols or languages so the networks can understand each other, for example, SCSI to FCP. They also provide tunneling services to establish an end-point to end-point data link, like a VPN for example. In enterprise networks, a network gateway usually also acts as a proxy server and a firewall.
One specialised gateway is a cloud storage gateway, which translates cloud storage APIs such as SOAP or REST to block-based storage protocols such as iSCSI or Fiber Channel. A Cloud Gateway is used to connect applictions to private clouds.
Other gateways could be a router or a firewall. Gateways live on the 'edge' of networks as all data must flow through it before coming in or going out of the network.
A proxy server is another type of gateway that uses a combination of hardware and software to filter traffic between two networks. For example, a proxy server may only allow local computers to access a list of authorized websites.
SAN attached devices like servers, disk arrays and printers need an interface to connect them to the SAN. The three types of adaptor are HBA, NIC and CNA.
Host Bus Adaptors or HBAs are circuit boards that are installed within a server. As the name implies, they are the interface between the fiber cable and the internal server bus. HBAs can support copper or fiber, and usually have two GBIC or SFP external connections.
A NIC, or Network Interface Card, is also sometimes called an Ethernet card and network adapter. These days most devices have either Ethernet capabilities integrated into the motherboard chipset, or use an inexpensive dedicated Ethernet chip connected through the PCI or PCI Express bus. A separate NIC is generally no longer needed. So while an HBA is used for fiber channel, a NIC is used for Ethernet.
A converged network adapter (CNA) is sometimes also called a converged network interface controller (C-NIC) and connects to a server via a PCIe interface. This combines the functionality of both HBA and NIC cards into one device, which supports both Fiber Channel and Ethernet. The server sends both FC SAN and LAN and traffic to an Ethernet port on a converged switch using the Fiber Channel over Ethernet (FCoE) protocol for the FC SAN data and the Ethernet protocol for LAN data. The converged switch converts the FCoE traffic to FC and sends it to the FC SAN. The Ethernet traffic is sent directly to the LAN.
The most common type of Fibre Channel SAN today is a switched fabric. Just to set the scene, here are some common terms that are used when talking about a SAN.
A 'Node' is a server or a storage device.
A 'Fabric' is the network that connects these nodes together, and includes the network switches.
A 'Domain' is a single switch within a fabric. A fabric can contain up to 239 domains, and in theory, the fabric can scale to about 16 million connections.
Switches can vary in size from small 16 port devices with little or no redundancy, to large 'directors' with hundreds of ports and no single point of failure. The principle behind designing a SAN is to optimise performance, management and scalability, within cost constraints of course.
There are five main types of fabric:
Single switch is the simplest fabric, but it can vary from a single 16 port switch to connect a few servers and a couple of storage devices, to a large director with hundreds of ports that connects a large enterprise together. A small switch is a single point of failure and will not scale, but can be a good way to start out.
Large switches, or directors are sophisticated pieces of kit and can have enough redundant components internally that they have no internal SPOF. In this situation, is the director then a SPOF? I've heard that one argued both ways. A large switch is also scalable, until it runs out of spare ports.
Now a variation on a single switch is two switches (bear with me). If all your servers and storage devices are dual pathed, and you have a multi-pathing
failover capability, then you can build a very resilient network with two single switch fabrics. Every device is connected to both fabrics, so if any part of one fabric goes down, all will work fine on the other fabric until the problem is fixed. This is a much better option than a single switch.
Single Switch / Dual Fabric SAN
In a Cascade SAN, the switches are simply inter-connected in a queue as shown below. There may or may not be a top level switch. An issue with this design is that you do not want to have to go through several hops to connect devices at either end of the queue, so you really need to try to localise your paths through the fabric if possible to go through no more than one ISL, and ideally, connect in the same switch. This makes a cascade design difficult to scale or change later. On the plus side, a cascase design does not need too many ISLs. The main issue with a cascade SAN is that if a switch fails, then some switches will not be able to communicate with each other. For that reason it is rarely seen these days.
A Loop fabric is essentially just a cascade fabric with the bottom switches connected together to form a ring. It is slightly harder to extend than a cascade, as you need to break the loop to install another switch. However, if one switch fails the other switches can still communicate with each other. Otherwise it has the same drawbacks and benefits as a cascade SAN.
In a Full Mesh SAN, every switch is connected to every other switch with an ISL. The advantage of this approach is that you can connect any device to any open port in the fabric, and know that it can connect to any other device after just one switch hop. The big disadvantage is scalability. When you add a new switch, it must be connected to every other switch in the fabric. It is obviously not suitable for low port count switches, as then most of the ports are used for ISLs. It can be scalable and effective for big switches, where the total port count is less than 2,000.
Full Mesh SAN
A Core/Edge SAN is a logical progression from full mesh as it does away with the requirement for lots of ISLs while preserving the one switch hop rule. It uses a high performance, highly available director for the core switch, which is connected directly to high performance servers and storage. Appliances that need lesser performance are connected to the core by slower edge switches. In some implementations, the storage devices are connected to the core and the servers to the edge switches.
Once you start building large SANs with lots of switches, you might not want to go to the expense of having two complete and separate fabrics for failover, but you still want redundancy. A Federate Fabric SAN contain redundant switches, so that every server is connected to two switches, and has two independent paths through the SAN to the storage. Again, the host servers must have multi-pathing software that can automatically failover if a path fails, and ideally load balance when two paths are available.
When a complex fabric is started up, it needs something to make sure that the switches are connected correctly, have unique domain ids, and are time synchronised. This can be done using fabric management software, but the Fibre Channel standard allows for a 'Principal Switch' or master switch to manage the network, and this needs to be your best performing switch.
An Edge - Core - Core -Edge is a variant of this model, often used for two site deployments. The two core switches are a large distance apart, and are usually connected by DWDM switches. The connecting links between the core switches can be called a backbone. This model can either have redundancy by having two core switches on either side of the backbone, or maybe by two separate fabrics, with servers and storage having connections to each fabric.
Virtual fabrics can be created by partitioning physical switch ports into several logical switches. This can improve switch utilisation.