Infrastructure
- Storage Spaces Direct
- Windows Volume Mgmt.
- Windows File Systems
- Deduplication
- Volume Shadowcopy Services
- Storage Replica
Unfortunately, Novell Netware is pretty much dead as an operating system. These pages will not be updated anymore, but will be retained for a while for the benefit of the faithful who continue to use this excellent operating system.
Novell did create the Open Enterprise Server, a SUSE Linux based OS that runs most of the old NetWare server functions.
Novell introduced server clustering in NetWare 5 then enhanced it in NetWare 6. This section discusses Novell Cluster Services 1.6 from a storage perspective.
A cluster is a group of file servers, servers are often called nodes in Novell documentation. A Netware 6 cluster contains between 2 and 32 servers. All servers in the cluster must be configured with IP and be on the same IP subnet. All servers in the cluster must be in the same NDS tree and the NDS tree must be replicated on at least two, but not more than six servers in the cluster. NetWare 5 and NetWare 6 clusters can coexist in the same NDS tree. Each server must have at least one local disk device for the SYS: volume, you normally connect your data disks to a cluster using a SAN.
Clustering allows services to survive the failure of a server. Any disks that were mounted on the failed server are switched to one of the other servers
in the cluster. Any applications which were active, or users who were logged onto the failed server are switched to another server. This is called failover and all users typically regain access to their resources in seconds, with no loss of data and usually without having to log in again.
It is also possible to manually invoke a failover if you need to bring down a server for maintenance or a hardware upgrade.
Novell Cluster Services 1.6 consists of a number of management modules or NLMs. The storage related modules are -
There are 6 other cluster management NLMs which are not discussed here
You manage the cluster with Cluster commands from the system console. You can see the full list of cluster commands by typing
HELP CLUSTER
At the console
Some useful commands are
CLUSTER VIEW
Displays the current node, and a list of nodes, i.e. servers.
CLUSTER RESOURCES
Displays the list of resources managed by the cluster, and which node has ownership of which resource
You can force a resource to move to a different node with the command
CLUSTER MIGRATE resource-name node-name
Netware provides a few screens to monitor the cluster operations.
The Logger screen displays loaded NLMs and NSS operations like enforcement of directory quotas
The Cluster Resources screen displays volume mount and dismount messages
Storage Pools are containers for logical volumes. A Cluster Services pool is simply an area of storage space created from the available Netware partitions. With NSS3.0, these can be virtual partitions, and can support a mixture of NSS and non-NSS volumes. A Storage pool must be either all local, or all shared.
A Shared storage pool can only be in use by one cluster node at a time to ensure data integrity. Data corruption would most likely occur if two or more nodes had access to the same shared storage pool simultaneously. This is managed by the Cluster System Services NLM.
Failover in Cluster Services 1.6 is by Storage Pool, whereas Netware 5 did failover by volume. If a shared storage pool is active on a node when the node fails, the cluster automatically migrates the pool to another node. The clustering software reactivates the pool and remounts the cluster-enabled logical volumes within that pool.
Inside the storage pools are the logical volumes. The volumes are only visible and accessible when the pool is active. As the logical volumes have no hard size limit they can request more space from the storage pool as needed. They hold the files and folders for users and applications.
When you define a pool with its volumes, you have to cluster-enable all the volumes. This creates a virtual server for each cluster-enabled volume, with its own server name and IP address. Applications and users access the volume through the virtual server name and the IP address of the virtual volume. This means that if the hosting server fails, and the volume fails over to another server, clients are not affected, and the IP address of the shared disk does not change. This is illustrated in the picture below.
In Netware 5.1 the virtual server name was generated by the system, and has the format NDStreename-diskname-server. DNS could not understand the underscores in the name, so the IP addresses had to be hardcoded. Netware 6 removes this restriction, you can override the default name with a name that DNS can understand.
Logical volumes have a new attributed called Flush On Close, which simply means that when a file is closed the cache is flushed to disk. This means that when you close a file, you can be confident that the data is safely stored on disk, and is not sitting in cache. If a server fails, any data resident in cache will be lost. Flush On Close is set 'ON' on the server, and will have some performance overhead.
The NDS information which is used to identify, name and track all Netware objects is stored by the CLSTRLIB NLM. Netware 5 had a problem with file control on SAN systems, as some NDS information was not transferred when a volume was migrated between servers.
The issue was that the trustee IDs for each user object were different for each server. On failover, it took several minutes to scan the entire file system and translate the trustee IDs to the new server, so the file trustee IDs were usually not translated at failover. The result was that disk space and directory space restrictions were not preserved.
In NetWare 6, server-linked trustee IDs are replaced with Globally Unique IDs (GUID), which are the same across all servers where the user has trustee rights of any kind. Volumes can now failover in seconds, and all trustee rights are preserved.
Backups had a similar problem. A file had to be restored to the same server it was backed up from, or trustee IDs would not match and the file could be corrupted. With NetWare 6 and NCS 1.6 any file can be backed up from any server and restored by any server without file corruption. The GUID remains intact, along with the appropriate user restrictions, regardless of physical server used for the backup and restore operation.
Netware Clusters have two types of disk, local disks and clustered disks. The SYS: disk will probably be local, and you may have others. The local disks are always attached to one particular server, while the clustered disks can move around the various servers in the cluster. 'Takeover Scripts' are used to make sure that the disks move cleanly between servers. TSM backups can be 'cluster aware', that is they can move with the disks as the disks move between the cluster servers.
Actually, it is not the disks that move around between servers but Netware Partitions. To keep things simple, many sites set up each disk in its own Netware Partition, but your site may have several disks in each partition. When the TSM manuals refer to a 'cluster group' they really mean a Netware partition.
The TSM software has to run on a physical server, but there is normally no way to decide ahead of time which physical server will be hosting a volume.
The key to backing up a cluster volume is that the backup metadata must be available from whichever server is hosting that volume, so the metadata must be held on the cluster volume. The metadata includes the dsm.opt file and the password file. The schedlog, errorlog and webclient log also need to be held on the cluster volume to get continuity between messages as the volume moves between servers. Every Netware partition needs its own dsm.opt file.
Just use a standard TSM install on each of the physical servers. The dsm.opt file should specify CLUSTERNODE=NO (or miss it out as that is the default). With this setting, if you use a domain of ALL-LOCAL then it will not see the clustered disks. The NODENAME should be the same as the server name
Each Netware partition must be defined to TSM as a separate node, and must have a unique name that is not the same as any physical server name. As each partitions will have a virtual server name, it is easiest to use that as the node name.
Allocate a TSM directory on a volume in the partition and copy a dsm.opt file into it. Assuming that your are storing the tsm info on a disk called CAV1, edit the dsm.opt file with the following settings
NODENAME | CAV1_SERVER |
DOMAIN | CAV1 |
CLUSTER | YES |
PASSWORDDIR | CAV1:\TSM\PASS\ |
PASSWORDAccess | GENERATE |
NWPWFile | YES |
OPTFILE | CAV1:\TSM\DSM.OPT |
ERRORLOGName | CAV1:\TSM\DSMERROR.LOG |
SCHEDLOGName | CAV1:\TSM\DSMSCHED.LOG |
To set up the passwords, from your first clustered server enter the following commands:-
Unload TSAFS then reload it with TSAFS /cluster=off
dsmc query session -optfile=CAV1:/tsm/dsm.opt
dsmc query tsa -optfile=CAV1:/tsm/dsm.opt
dsmc query tsa nds -optfile=CAV1:/tsm/dsm.opt
Make a copy of dsmcad in the SYS:/Tivoli/tsm/client/ba/ directory and give it a unique name for this volume, say DSMCAD_CAV1 then start the scheduler with
dsmcad_CAV1 -optfile=CAV1:/tsm/dsm.opt
Repeat this for every server in the cluster, and get the DSMCAD command added to the takeover scripts so the correct DSMCAD is started as a volume moves between clustered servers.