Storage Replica

Overview

Storage Replica is a software implementation of volume replication technology, new with Windows 2016, that is designed for disaster recovery. It protects against hardware failure by exactly duplicating a volume block by block, and will also protect against site failure if it is used on stretched clusters that span two sites. Like any other replication technology, Storage Replica does not eliminate the need to take backups, as data corruption or user errors would be replicated. If you are not familiar with replication technology then maybe we should spell that out. Replication keeps two copies of a disk in exact synchronisation. So, if you rely on replication for backups and you accidentally format and wipe a volume, Storage Replica will oblige and format and wipe the replica too and you will have lost all your data. No backup, no data, lost forever. Take backups! You should ONLY backup the Data disk from the Source server. Don't backup your Storage Replica Log disks as that can conflict with Storage Replica operations.
Storage Replica is also not intended to provide a second copy of data that can be updated, either locally or at a remote site as that would not be a valid disaster recovery copy. However, the 2019 version Storage Replica does let you take a snapshot of the replicated disk, and that snapshot can be accessed.
While there are many hardware replication solutions on the market, they all require the same type of hardware at both sides. Because Storage Replica is a software implementation, it is storage-agnostic and supports unlike hardware.

Windows Server 2019 brings in three new Storage Replica options available::
Storage Replica is now avaiable on Standard Edition (SE), not just Datacentre Edition (DE). However the Standard Edition has some limitations: DE replicates an unlimited number of volumes, but SR will just replicates a single volume. SE Servers can have one partnership instead of an unlimited number of partners. SE volume size is limited to 2 TB, DE volume size is unlimited.
Storage Replica Log performance is improved so replication throughput and latency is much improved, especially on all-flash arrays and Storage Spaces Direct (S2D) clusters that replicate between each other. All replicating servers must be at Windows Server 2019.
Storage Replica Test Failover The issue with not being able to mount or read the replica means you can't test it to see if it is working. With SR 2019 you can take a snapshot of the replica and mount that. The snapshot is writeable, so you can run a full DR test against it without affecting the replica. You could also take a backup from the snapshot.

Storage Replica supports both synchronous and asynchronous replication

Synchronous Replication

When an application writes data out to storage, it waits until it gets confirmation that the write succeeded before it continues processing. With synchronous replication, the applicaton waits until it gets write confirmation from both the local and the remote site. If you want to provide zero data loss disaster recovery, your second disk needs to be several kilometers away from the primary disk, so to prevent the replication from affecting application performance, you need to use fast networks and fast disk subsystems. So why would you want to use synchronous replication? Well if you are working on financial systems like a bank, it should be obvious that it is essential that no financial transaction data be lost in a disaster. What about an airline booking site? Imagine the fuss if you had a disaster and lost 12 hours worth of bookings, so your customers have paid for their flights, but you have no record of them.

One of the features of replication is that the destination volume is not accessible while replicating. I've seen people complain about this on blogs, but for one thing, data is replicated at block level, so if you could update at file level, you would corrupt the data. For another thing, this data is for disaster recovery. If you start updating the target disk, then it is no longer a valid DR copy. The destination volume will be dismounted when replication is configured, and while it is possible that its drive letter may be visible in Explorer, you will not be able to access the volume itself.
As mentioned above, with Windows Server 2019 you can take a snapshot of the replica and mount it as a writeable volume, which resolves this issue.

Asyncronous Replication

Asynchronous replication simply means that the application will consider a write is complete when the data is safely stored on the source disks. Data is then written out to the remote disks later, without slowing down the application. This is quite adequate for many applications and is usually cheaper to implement than synchronous replication. Some implementations use snapshots for asynchronous replication, but Storage Replica implements asynchronous exactly like sychronous replication, without the need to acknowledge the write at the destination disk. There is no guarantee that both sites have identical copies of the data at the time of a failure, but it will work over slower networks and longer distances than synchronous replication.
Because Storage Replica operates at the partition layer, it will replicates any VSS snapshots created by Windows Server or backup software.

Storage Replica terminology

I've mentioned a few terms above without really explaining what they mean.

Supported configurations

With Windows Server 2016 Datacenter Edition, you can deploy storage replication in a stretch cluster, between cluster-to-cluster, and in server-to-server configurations.

If you want to test DR, or do site maintenance then you can switch the replication direction, so your DR site becomes the primary. However you must wait until the initial sync is complete before trying this.
Storage Replica uses consistency groups, where volumes can be grouped together and managed as an entity. For example, if you are replicating SQL databases that span multiple volumes, then it is essential that the replicated writes are sent out in the same order, otherwise the replicated database could be corrupt if a disaster happens. If the relevant volumes are in a consistency group, then Replica will write out the data to the destination server in the correct order.

back to top


Using the Gui to Configure Storage Replica on Windows Server 2016

You can configure Storage Replica using PowerShell commands or with the Windows Admin Center interface. This page describes the Windows Admin Center method and to use this, you will need to download and install Windows Admin Center and Remote Server Administration Tools on your PC. You also need to do quite a bit of preparation work as follows:

Provide two servers, preferrably in two different physical locations. If you are planing to use synchronous replication, then your 2 sites must be close enought for your network to provide an average of 5ms round trip latency. Each server should have at least 4GB RAM and need to be Domain Members n the same Active Directory forest. Install Windows Server 2016 Datacenter on both server nodes. It has to be the Datacenter edition, no other Windows variants will do.
To install Storage Replica
  Navigate to Server Manager
  Select one of the servers.
  Navigate to Roles & Features.
  Select Features > Storage Replica, and then click Install.

Your storage can be SAS JBODs, fibre channel SAN, iSCSI target, or local SCSI/SATA, but should be a mix of HDD and SSD media. You need two sets of storage, one for each side, and each server must only be able to see the storage on its own site, no sharing. The physical storage must have identical sector sizes. Remember, the volume that contains the Windows Operating System cannot be replicated.
Configure the storage at each site into at least two volumes, one for data and one for the logs. The log volumes should be configured from SSD media and need to be sized to at least 9GB, and identically sized at each site. Both Log and Data volumes must be initialized as GPT, not MBR.

The two servers need a minimum of one ethernet/TCP connection on each server for synchronous replication, but Remote Direct Memory Access (RDMA) would be better. For synchronous replication, the bandwidth must be enough to maintain your I/O write workload with about a 5ms latency. You also need to configure your firewall to allow bi-directional communication between the servers. Ports 455 (SMB), 5445 (SMB Direct) and 5985 (WS-MAN) could be required, but check with Microsoft for the current list.

Now you have all your hardware and software in place you can configure server-to-server replication

From the Windows Admin Center
Add the source server.
  Select the 'Add' button.
  Select 'Add server connection'.
  Add in the Server Name, then select Submit.
Now, on the 'All Connections' page, select the source server.
Select 'Storage Replica' from 'Tools' panel.
Select 'New' to create a new partnership.
Provide the details of the partnership
  The Source Server Name, RGname, Data Volume Name, Log Volume Name.
  The Destination Server Name, RGname, Data Volume Name, Log Volume Name.
and when all the partner data is entered, hit 'Create'.

back to top


Installing with Powershell Commands

Here is the install process between two Windows servers using Powershell commands. This process assumes the source server is called PRISERV01 and the target server is called SECSERV01. First you need to install the Storage Replica feature:

install-WindowsFeature "Storage-Replica" –IncludeAllSubFeature

A server reboot will be needed once this command is entered.
Next you need to configure the disks on both servers. You need (at least) 2 disks on each server, one for the data and one for the logs. If your disks have different performance characteristics, then use the faster disks for the log volumes. The Get-Disk command with no parameters will list all the available disks on the system. The disks can be direct or SAN attached and can be real or virtual, and exactly how you configure your disks will depend on what kind you have. The Windows Disk page describes different kinds of disks and one way to format them.
So assuming you have Storage Replica installed and all your disks online, Storage replica has a good test tool that you can use to check that the network is fast enough and the server can deliver the performance needed for Replica. This assumes the data and log volumes are 'e:' and 'f:'. You need to substitute your own server names, drive letters and the path for the report file.

Test-SRTopology -SourceComputerName "PRISERV01" -SourceVolumeName "e:" -SourceLogVolumeName "f:" -DestinationComputerName "SECSERV01" -DestinationVolumeName "e:" -DestinationLogVolumeName "f:" -DurationInMinutes 30 -ResultPath c:\Temp

If that report looks OK, you can then set up Replica with a PowerShell command. You should adjust the default log size from 8GB if the performance report suggests that is necessary. However note that the log size depends on how much write IO your workload performs. A larger or smaller log doesn't make you any faster or slower. A larger log simply means that more write blocks can be stored. Once the log fills up, the log wraps and older records are overwritten. This means that if you have a problem, a network outage for instance, then when the network returns, Replica can quickly resynchronise the Target with the Source by copying over the changed blocks that are recorded in the log files. You must never allow any other workloads to run on the log volume.
In the command below you need to substitute your own server names, resource groups and drive letters.

New-SRPartnership -SourceComputerName "PRISERV01" –SourceRGName rg01 -SourceVolumeName "e:" -SourceLogVolumeName "f:" -DestinationComputerName "SECSERV01" –DestinationRGName rg02 -DestinationVolumeName "e:" -DestinationLogVolumeName "f:"

You will get synchronous replication by default. If you wanted asynchronous, add parameter '-ReplicationMode Asynchronous'
To find out if all is working, you can view the state for the replication source and destination using the following commands.

Get-SRGroup
Get-SRPartnership

If you want to remove Replication, run the following command. This command will retrieve ALL current Storage Replica partnerships that are on the machine you are runnng the command from, then pipe the list of partnerships to the 'Remove' command. It is possible to remove individual partnerships.

Get-SRPartnership | Remove-SRPartnership

To change the direction of the replication use this command. You need to substitute your own server names and resource groups.

Set-SRPartnership -NewSourceComputerName "SECSERV01" -SourceRGName Srg02 -DestinationComputerName "PRISERV01" -DestinationRGName rg01

back to top


Hints and Troubleshooting

Replica Performance Statistics

For troubleshooting there are some events that you can check, check the events with PowerShell

Get-WinEvent -ProviderName Microsoft-Windows-StorageReplica -max 20

There are also a lot of performance counters that can be viewed with PowerShell

Taking a Snapshot of the Destination Volume

This tip only applies to Windows Server 2019 onward. The problem is that you need your source volume mounted to take a snapshot and Storage Replica dismounts the destination volume when replication begins. To mount the replica, you must have an unused NTFS or ReFS formatted volume that is not currently replicating on the destination server. You use this volume for the snapshot logs and so effectively see a PIT copy of the replicated data on this drive.
For example, to create a test failover where you are replicating a volume "D:" in the Replication Group "rg1" on the destination server "SECSERV01" and have a "G:" drive on SECSERV01 that is not being replicated:

Mount-SRDestination -Name rg1 -Computername SECSERV01 -TemporaryPath G:\

The replicated volume D: is now accessible on SECSERV01. You can read and write to it normally, copy files off it, or run an online backup that you save elsewhere for safekeeping, under the D: path. The G: volume will only contain log data.
To remove the test failover snapshot and discard its changes:

Dismount-SRDestination -Name rg1 -Computername SECSERV01

You should only use the test failover feature for short-term temporary operations. It is not intended for long term usage. When in use, replication continues to the real destination volume.

Azure Support

Storage Replica supports Azure Cloud in these scenarios.

back to top


Windows Storage

Lascon latest major updates

Welcome to Lascon Storage. This site provides hints and tips on how to manage your data, strategic advice and news items.