SQL Database Continuous Availability

SQL Server Clustering

SQL Server has an 'AlwaysOn' feature which offers 100% availability and it uses Windows Server Failover Clustering (WSFC) to achieve this. A Failover Cluster Instance or FCI is a single instance of SQL Server that is installed across several Windows Server nodes and, possibly, across multiple subnets. Externally, an FCI appears to be a single instance of an SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

The WSFC service manages the server cluster, quorum configuration, failover policy, and failover operations, as well as the Virtual Network Name (VNN) and virtual IP addresses for the FCI. The WSFC cluster node which currently owns the resource group is called the active node. If the active node fails, or the service is moved for a planned upgrade, the resource group ownership is moved to another node in the FCI. A WSFC cluster can run multiple FCIs, with individual FCIs active on different nodes.

An SQL Server FCI runs in a WSFC resource group and there are some rules regarding the WSFC Resource Group hardware and software. The physical servers must contain a similar hardware configuration and an identical software configuration right down to operating system version and patch level. The SQL Server code and configuration must be identical on every node in the cluster. This includes software version, patch level, components, and the instance name.
The product binaries are installed locally on every node of the FCI, but the services are not started automatically, but managed by WSFC.

All the nodes in the FCI must have the same view of instance data whenever a failover occurs, so an FCI must use shared storage between all nodes of the FCI for database and log storage. The shared storage can be either SAN storage, file shares or WSFC cluster disks. The shared storage is a potential single point of failure so it should be raided and mirrored to protect it.

An FCI has a Virtual Network Name (VNN), which means that applications can connect to the VNN without the need to know the current active node. When a new active node starts up in a failover event, the VNN is registered to this new active node. As the applications are connected to the VNN, this means that the process is transparent to the client or application connecting to SQL Server and so the application downtime is minimised. If you use multi-subnet FCIs then a virtual IP address is assigned to each subnet. If a failover involves changing subnets then the VNN is updated on the DNS server to point to the virtual IP address of the surviving subnet. This means that applications and clients can continue to use the same VNN after a multi-subnet failover.

Depending on how the nodes are clustered, the SQL Server failover cluster is configured in the following ways:

  • Nodes on the same subnet or the same set of subnets - The IP address resource dependency is set to AND for these types of configurations.
  • Nodes on different subnets - The IP address resource dependency is set to OR and this configuration is called a SQL Server multi-subnet failover cluster configuration.

The OR IP address resource dependency option was introduced with SQL Server 2012. It means that the SQL Server can be online when there is at least one valid IP address that it can bind to, which can be important as the IP addresses are not owned by all the nodes in the failover cluster, and may not be all online during SQL Server startup. Earlier releases of SQL server relied on a stretched V-LAN to expose a single IP address for failover across sites.

A SQL Server multi-subnet failover cluster is a configuration where each failover cluster node is connected to a different subnet or different set of subnets. These subnets can be in the same location or in geographically dispersed sites. Clustering across geographically dispersed sites is sometimes referred to as stretch clusters. As there is no shared storage that all the nodes can access, data should be replicated between the data storage on the multiple subnets. With data replication, there is more than one copy of the data available. Therefore, a multi-subnet failover cluster provides a disaster recovery solution in addition to high availability.

Various ways exist to configure multiple subnets. You could have 2 nodes, each on a different subnet; 3 nodes, 2 on one subnet and 1 on the other; one node connected to both subnets and one node connected to just one subnet. In all these cases SQL Server Setup sees this configuration as a multi-subnet cluster and sets the IP address resource dependency to OR.
However if you have 2 nodes and 2 subnets, and connect both nodes to both subnets, then this configuration is not considered as a multi-subnet failover cluster configuration because the clustered nodes are on the same set of subnets so the IP address resource dependency is set to AND by SQL Server Setup.

You have options for OR configuration, for example, when there is a failure of one of the IP addresses on the node that currently owns the SQL Server cluster resource group, a failover is not triggered automatically until all the IP addresses valid on that node fail. When a failover occurs, SQL Server will come online if it can bind to at least one IP address that is valid on the current node. The IP addresses that did not bind to SQL Server at startup will be listed in the error log.

It is possible to install an SQL Server FCI instance and a standalone SQL Server instance on the same server node, but if you do this you need to make sure the TCP port assignments do not clash. The default TCP port is 1433 and it is usually best to leave the default for the FCI instance and configure the standalone instance with a fixed port.

Database Snapshots

A database snapshot is a read-only view of a SQL Server database frozen at a moment in time. The original database is called the source database and a database snapshot always resides on the same server instance as its source database.

The ability to take database snapshots came with SQL Server 2005. Database snapshot is the same in principle to disk based snapshots, except that with disks the unit of data is a block, whereas the unit of a database snapshot is a page. A snapshot is a point-in-time copy of a database that can be used for rapid backups, or for testing purposes. Snapshots could also be used for applications that only run in read-only mode against a consistent point in time copy of a database. An excellent example would be month end or year end reports.
The initial copy is made by creating a set of pointers that refer to the pages in the original or source database. As the source database is updated, the original pages are copied to the snapshot, thus preserving the point in time condition of the snap without having to copy all the data initially. A user initiated database snapshot is implemented by using sparse files, an NTFS feature. Initially a sparse file contains no user data and has no disk space allocated. This process is often called copy-on-write.

You can have create several snapshots on a given source database and they will persist until they are explicitly dropped by the database owner. Database snapshots are dependent on the source database and if that database becomes unavailable for any reason, all of its database snapshots also become unavailable. Backing up the source database works normally; it is unaffected by database snapshots. Database snapshots always work on an entire database.

 animation showing database snapshot process

SQL Server creates its own database snapshots when it issues some DBCC commands like DBCC CHECKTABLE. These snapshots are implemented using sparse alternate data streams.

A database snapshot is consistent. There is no need to quiesce or stop the database before taking a snapshot as SQL Server ensures consistency of the snap copy by rolling back any uncommitted transactions at the point the snap was taken. Transactions are applied as normal to the source database. Because a snapshot database gets pages that have not been updated from the source database, the source database must be available and both databases must be in the same phyysical server for the snap to work.

A database snapshot is read only. From the user's point of view the snapshot database always looks unchanged because reads always access the original data at the point of snap. If a page has not yet been updated on the source database, then a read operation on the snapshot reads the original page from the source database. If a page from the source has been updated, then the original page will be copied to the source file before the update is written to disk. So after a page has been updated, a read operation on the snapshot still accesses the original page, which is now stored in a snapshot file.

A database snapshot does not use much space. Well, that statement is not exactly true. Physical space occupied by a database snapshot will grow larger the longer the snapshot exists, but the growth rate will depend on how often the source database is updated and how localised the updates are. A snapshot page is only copied the first time it is updated at the source, updates after that do not affect the snapshot. This means that a database could have a very high update rate, but those updates might be localised to only 10% of the database space, so the snapshot will never grow to much more than 10% of the source.
Sparse files grow in 64KB increments, and if there is no room on disk to add an extra 64KB the snapshot will be marked as suspect and has to be dropped. None of this affects the source database, unless it is competing for disk space with the snapshot. It is a good idea, especially for a busy database, to have a process for renewing snapshots at regular intervals to prevent them from using up too much disk space. It is also useful to monitor a database before a snapshot is taken, so you know how much update activity to expect and can get the snap sizing correct.

A database snapshot is dependent on the original database. If that database becomes unavailable then the snapshot is unusable. It is also possible to use a snapshot to revert a source database back to the point in time when the snapshot was created. This does not mean you do not need backups anymore as you cannot roll forward to any point in time from snapshots. Also, as database snapshots are dependent on the source database, if that database is destroyed by hardware failure, then the snapshots are also destroyed. One way to utilise the 'restore from snapshot' facility might be for testing purposes, when you take a snapshot before you start testing, run the tests, then revert back to the starting point using the snapshot.

There are limitations on how database snapshots can be used, here are some of them:

  • The source database must be online, unless the database is a mirror database within a database mirroring session.
  • You can create a database snapshot on any primary or secondary database in an availability group. The replica role must be either PRIMARY or SECONDARY, not in the RESOLVING state.
  • To create a database snapshot on a mirror database, the database must be in the SYNCHRONIZED mirroring state.
  • The database cannot be dropped, detached, or restored.
  • Performance is reduced, due to increased I/O on the source database resulting from a copy-on-write operation to the snapshot every time an original page is updated.
  • Files cannot be dropped from the source database or from any snapshots.
  • Since Snapshots are read-only, they cannot be upgraded. Therefore, database snapshots are not expected to be viable after an upgrade.
  • Snapshots of the model, master, and tempdb databases are prohibited.
  • You cannot change any of the specifications of the database snapshot files.
  • You cannot drop files from a database snapshot.
  • You cannot back up or restore database snapshots.
  • You cannot attach or detach database snapshots.
  • You cannot create database snapshots on FAT32 file system or RAW partitions as they do not support sparse files.
  • A database snapshot inherits the security constraints of its source database at the time of snapshot creation. Because snapshots are read-only, inherited permissions cannot be changed and permission changes made to the source will not be reflected in existing snapshots.

back to top