Investigating Hardware issues

If you have problems with your SAN it can be a good plan to start with your vendors as they should provide troubleshooting support. They should also provide utilities and instructions for managing and configuring their products. Most SAN devices provide monitoring tools that can be used to report information about errors, and SAN configuration. A data gateway should report all the FC devices as well as the SCSI devices that are available in the SAN. A switch should report information about the SAN fabric. Checck that the drivers and firmware on each device is up to date.

HBAs should be set appropriately based on the SAN topology being used. For example, if your SAN is an arbitrated loop, the HBA should be set for this configuration. If the HBA connects to a Switch, this HBA port should be set to "point to point" and not "loop".
The ports on SAN switches need to be configured appropriately so they match the HBA values. For example, if the FC switch maximum speed is 1GB/sec, the HBA should either be set to this value, or set to auto negotiate. The ports on the switch need to be configured appropriately for the type of SAN topology being implemented, so if the SAN is an arbitrated loop, the port should be set to FL_PORT, and if the HBA is connected to a switch then the HBA options should be set to point-to-point and not to loop.

Dual Fabrics

The most robust way to design a SAN is to have two independent fabrics, and connect every server, and every storage device to both. That provides you with total resilience. Now all the SAN switches have a LAN port, which you connect to for configuration and maintenance. If you connect the switches together using ISLs, then you can see every switch from one LAN connection. So if you are installing new firmware, you install it to one switch, and let it propagate out to the others. If you have a dual fabric, then don't connect any switch in one fabric to any switch in the others. That keeps the fabrics totally independent. Then, if there is a problem with the firmware, it will only affect one half of your SAN, and the other half should work fine.

Fan-out and Fan-in

Fan-out, is how many hosts can be attached to a storage path
Fan-in, is how many storage ports can be served from a single host channel

How many servers can you run down a single fibre channel? As you might expect, there is no simple answer to this, it is based on the FC capacity, and the amount of work your servers are doing. You can work this out if you know how much traffic each server is generating. The expected total I/O should not exceed the channel I/O especially if VMs are in the picture. You want to get the right balance between getting maximum use out of your fiber capacity, while at the same time delivering good peformance to your hosts. A starting rule of thumb is shown below, but be sure to monitor port usage and avoid overloading your system.

When designing your SAN, you must consider the future. Your growth rate will determine the number of connections you will require, and your SAN must be scaleable, as far as you can predict the future. A core edge SAN topology is arguably the best. Your SAN also needs to be available, so you need two independent paths from every server to the data. The paths should be routed through 2 directors, or two independent switch paths. If there are two switches connected by an inter switch link (ISL) between them, then you should ensure that there is a minimum of two ISLs between them. The initiator and target ports should not be more than six per ISL.
Your initial design should include some free ports for growth, but eventually you will need more switches. This is where a 2 tier switch approach can help, as it improves scaleability. The top tier connects to the servers, and the bottom tier connects to the storage, with each switch in the top tier connected to each switch in the bottom tier. This makes it easy to extend the fabric, while providing more redundant paths and better bandwidth.

When working out how many paths are needed to a storage subsystem, remember that UNIX and Windows cannot share physical paths with other operating systems


If switches are cascaded in a fabric, then they can all be monitored and managed by a single screen.
16 port switches have an extra ethernet connection. One switch needs to have this connected to the network.

Forward all your syslogs, historical logs and switch messages to one central location to simplify trouble shooting.

Enforce the use of personal accounts rather than global shared accounts and audit usage so changes can be tracked

Make sure that your host operating systems are updated with latest HBA drivers and firmware. This can both improve performance and resolve storage connectivity problems. Where you have several hosts sharing a storage resource, a disk subsystem or a tape library, it is best to keep the HBA drivers consistent over all the hosts.

Try not to intermix different vendor's FC switches and HBAs on the same SAN, as mixing them could lead to compatibility and performance issues.

Use only one initiator per SAN zone.

Use a dedicated VLAN for storage traffic on an iSCSI SAN and use jumbo frames. Keep your ISCSI initiator version current and set the iSCSI port settings to Ethernet full duplex to get best performance.

Windows Clusters and SANs

Every Windows cluster should be configured into its own SAN zone. Storage LUNs must be available to every node in the cluster, and visible to that cluster only to prevent data corruption.

All HBAs in a cluster must be of the same type and running the same firmware level, and all the device drives must be running the same software version.

Never add an arbitrated loop HBA into a switched fabric SAN, as this can cause the whole fabric to go down.

If you connect a server with multiple HBAs, always load the multi-path driver software, or else when the server sees two HBAs it will assume they are on different buses and give each disk tow different device numbers. It will then apparently see two disks with the same disk signature and try to re-write one of them. The disk will then fail and the data could be corrupted.

If you use a storage subsystem snapshot facility to create a copy of a volume, it will have the same disk signature as the original. If you try to mount the snapshot to the same server as that hosting the original disk, the server will overwrite the snapshot disk signature. If you mount the disk to another server in the cluster, you will have two identical disks in the cluster and will corrupt your data. The answer is to mount the snapshot disk to a server that is not in the cluster.

Disks must be added to the cluster as cluster resources. Zone the disk to one server first, add it as a cluster resource, then zone it to the rest of the servers in the cluster.

Storage Area Networks

Disk Protocols

Lascon updTES

I retired 2 years ago, and so I'm out of touch with the latest in the data storage world. The Lascon site has not been updated since July 2021, and probably will not get updated very much again. The site hosting is paid up until early 2023 when it will almost certainly disappear.
Lascon Storage was conceived in 2000, and technology has changed massively over those 22 years. It's been fun, but I guess it's time to call it a day. Thanks to all my readers in that time. I hope you managed to find something useful in there.
All the best

back to top