MSSQL Database Replication
MSSQL Replication
MSSQL allows you to replicate your databases locally, or to replicate to different servers over a LAN or via the internet. You may want to do this to create a copy of a database for development work, or you may want to maintain a remote copy of a database that is consistent with your local copy, for disaster recovery. You may also want to replicate to mobile devices so that mobile staff can work independently of the central office.
Replication Databases
Some replication terms explained
The basic unit of replication is an Article which can be a table or part of a table.
A Publisher is the source server or database that holds the base data that needs to be sent to another server or database.
A Subscriber is the target server or database that receives data from a publisher.
The act of sending articles to another server or database is called Publication. The act of receiving articles from a publisher is called Subscription. If a Publisher send out unsolicited articles to subscribers, this is called Push Subscription. If a Subscriber logs into a Publisher and requestes articles this is called Pull Subscription
The Distributor server manages the replication process, and can run stand-alone or as a combined Distributer / Publisher server. Replication information like copy jobs, pending transactions, replication status and history is held in the Distribution Database. The distribution database is used by the replication components of SQL Server, to store data including active transactions, snapshot jobs, synchronization status, and replication history information.
Types of Replication
There are three types of replication, Snapshot; Transactional and Merge. Transactional or Merge replication both start by applying a Snapshot to the Subscriber.
Snapshot replication is used to make a point-in-time copy of a database. You can schedule snapshot so it refreshes the replica on a regular basis. Snapshot replication is useful on its own for data that does not change often, or small files.
Transactional replication will capture changes made to articles in the Publisher database and store them in the distribution database. They are then sent out to the Subscribers in the correct transactional order. Transactional replication relies on the Subscribers being constantly connected to the Publisher and is generally more appropriate for large, very active databases. These subscriber databases should really be considered as read-only, as there is no way to get any updates back to the publisher.
Merge replication allows Subscriber users be more independent. They can work 'off-line' and update the Subscriber databases independently of the Publisher database. The Distributer then intercepts all updates and merges them together using a set of user created rules to handle collision updates. This sounds complicated and obviously has scope for data corruption, especially if a subscriber has been disconnected from the Publisher for a while. Potential conflict rules are; first update wins, treat a whole portion of data is a unit, merge data based on business logic. Merge replication is useful for mobile devices.
Backup and Recovery with Replication
The challenge with backup and recovery in a replicated environment is to make sure that the Publication database, the Distribution database and the Subscription databases all end up in a consistent state after a recovery. You need to backup your Production and Distribution databases, Subscriber databases are optional.
Publication Database
The data flow process is; data is updated in the Publication database and is then copied to the Distribution database where it is held until it is propagated out to all the Subscriber databases. If you restore the Publication database back to an earlier point, transactions may exist in the Distribution database that have been backed out from the Publication database
SQL Server 2000 has a parameter 'sync with backup' that ensures data consistency. The process that copies data out to the Distributer database is called the Log Reader Agent, which monitors the log for data changes and immediately propagates them to the distribution database. If you set parameter '0sync with backup' to 'true' the Log Reader Agent will not propagate any transactions until they have been backed up at the Publisher. This has three consequences
- You can guarantee that your backups are consistent with transactions passed to the Distributer
- You need to backup the publication database frequently
- It will take longer to replicate articles, as replication will wait for the next backup to complete.
Your choice is between performance and data integrity.
To ensure that your backups are consistent, execute the command
sp-replicationdboption '', 'sync with backup', 'true'.
If your performance is more important then you can run with ‘sync with backup’ set to ‘false’. If you need to restore the Publication database, you would then have to reconfigure replication to resychronise all the databases, or if you want, you can force replication to run with inconsistent data. This is not recommended, but if you really want to do it,
- Ignore any errors from the Log Reader Agent
- Run sp-replrestart to force replication to continue
- Configure the Distribution Agents with the SKIPERROR parameter so they ignore duplicate rows
Distribution database
In normal operation, the Log Reader Agent will scan the log on the Publication Database to find the next set of transactions that need to be replicated. It will then propagate the transaction out to the Distribution Database and once they are successfully commited it will tell the Publication Database. These transactions are then deleted from the log at the Publication Database and so cannot be propagated back out again if required.
This affects backup and restore of the Distribution database. If you back-level the Distribution database by restoring it, you cannot get it back up to date by re-applying transactions if the missing transactions have been deleted from the Publication database log.
To fix this, set the ‘sync with backup’ option to ‘true’ and also set trunc. log on chkpt. option of sp-dboption to false on the Distribution database. This will ensure that the Distribution database is backed up before the Publication database is allowed to truncate its logs. This will not delay replication, but it could increase the size of the Publication Database transaction log.
Subscription database
You do not have to backup Subscription databases. If you lose a Subscription database you can take a new snapshot then re-initialise the subscription.
If you want to be able to recover a subscription database, then change the minimum transaction retention period in the Distributor database from its default value of ‘0’ to a value that is bigger than the interval between Subscription database backups. Then when you restore a Subscriber, the Distribution Agent will propagate any transactions that are missing from the Subscriber.