BI Future Blog: SQL Server : A tutorial for building a replication between 2 machines (Part I)

Introduction

Replication is a set of technologies for copying and distributing database objects and data from one database to another database. After the intial configuration a synchronization process is running to maintain consistency.The distributor is kept in sync synchroneously with the publisher.

This blogpost is the first blogpost in a series of blogposts about replication on 2 machines. This series of blogpost is about a more complex configuration of replication than the blogpost I've written earlier.

These are the blogposts about this series about replication:

SQL Server : a tutorial for building a replication between 2 machines (Part I).
SQL Server : a tutorial for building a replication between 2 machines : Share (Part II).
SQL Server : a tutorial for building a replication between 2 machines : Distributor (Part III).
SQL Server : a tutorial for building a replication between 2 machines : Publisher (Part IV).
SQL Server : a tutorial for building a replication between 2 machines : Subscriber (Part V).
SQL Server : a tutorial for building a replication between 2 machines : Monitoring (Part VI).

The main parts

In this section I'll go into detail about replication. For replication, there are three main parts important, namely the publisher, the subscriber and distributor:

Publisher. The publisher has the data (publication) it can offer to other subscribers. There can be multiple subscribers.
Subscriber. The subscribers are the database servers that want to have the data from the publisher when the data is changed.
Distributor. The distributor keeps track of the subscribers and to ensure that the subscribers are notified of changes.

Below another picture I've borrowed from a SQLServerCentral.

Green is read and red is write

Initial data set

The initial dataset is the dataset that is initially captured on the publisher and stored on the subscriber. This is done by the snapshot agent. It is also possible to carry this out through a backup or SSIS.

Changed dataset

The log reader agent is running at the distributor. The log reader identifies INSERT, UPDATE, and DELETE statements or other changes marked for replication in the transaction log. The agent will then copy transactions in batches to the distribution database on the distributor. The distribution database is a store-and-forward queue and the final step is storing the changed data set on the subscriber.

The agents

The transactional replication of SQL Server is implemented with a Snapshot Agent, Log Reader Agent, Queue Reader Agent and a Distribution Agent:

The Snapshot Agent makes snapshot files based on the schema and data from the publisher of the databaseobjects and data. The Snapshot Agent places a shared lock on all tables ready for publishing while the BCP files are generated. This ensures transactional consistency of the data, but it ensures that other requests that attempt to write simultaneously are blocked. Depending on the size of the table this may be substantial. This is only the first time. During normal operations there are much finer-grained and very short-locks required.
The log reader Agent monitors the transaction log of each database which is configured for replication.. The Log Reader agent is responsible for copying the transactional log records in the distribution database. This is done according to the ACID principles. The LogReader connects the Publisher, and looking through the log files for records marked with the replication flag and copies the information in the distribution database. Transactions can be as long as the LogReader records when tranaction can't be removed from the log file due to problems. Growth of the log file can be the result. A possible cause may lie in the distributor that is out. This is viewable in the "log_reuse_wait_desc column column in the sys.database.
The distribution agent copies the initial snapshot files from the snapshot folder and the transactions in the distribution database to the subscriber. The distribution agent is responsible for the transfer of the distribution database to the subscription database. The distribution Agent connects with the distributor and reads the stored changes. Then, the changes will be processed in jacket order.
The Queue read agent is used together with transactional replication with updating. The agent runs on the distributor and move changes from the subscriber back to the publisher.

Transactional replication is not performed by SQL Server database engine but with windows programs that connect to the servers involved in the replication process. The agents are implemented by the SQL Server Jobs.

Important choices

For replication, there are several possible important choices, I've gathered so far:

Type of replication. Possible options: snapshot (eg 1 x daily.), Transactional Replication (almost realtime), merge replication (merge data from different sources, eg, salespeople).
Configuration. It is possible to place the distributor (monitors and supervises the replication process) on a separate machine.
Push and pull configuration.

When chosen for push the distributor gets the data from the Publisher and sends it to the subscriber.
For pull the distributor gets the data from the publisher and the subscribers retrieve the data from the Distributor.

Conclusion

In this blogpost I've desribed the main parts of replication, the agents and the important choices of replication.

Greetz,
Hennie

BI Future Blog

maandag 27 oktober 2014

SQL Server : A tutorial for building a replication between 2 machines (Part I)