Replication Methods define how Stitch will replicate your data during a replication job. While SaaS integration tables have their Replication Methods defined by Stitch, you can define how tables in database integrations are replicated.

Replication Methods are extremely important - we can’t stress this enough. They’ll not only directly impact your row count, but incorrectly defined methods can also cause data discrepancies.

Stitch employs three methods to replicate data from your data sources:


Log-based Incremental Replication

Log-based Incremental Replication is a replication method in which Stitch identifies modifications to records - including inserts, updates, and deletes - using a database’s binary log files. A binary log file is a record of events that occur within a database.

There are two types of binary log replication: statement and row-based. Stitch uses a row-based approach, which means that when rows are modified, the entire row is written to the binary log file. Stitch then reads the changes from the binary log and replicates the appropriate records.

Using Log-based Incremental Replication requires a specific database configuration and is only available for MySQL- and PostgreSQL-backed databases. For setup instructions, refer to the database integration documentation for your database.

Deleted record handling and Log-based Incremental

Depending on the method used to delete a record, Stitch may be able to capture the deleted record. Refer to the Deleted Record Handling guide for an explanation and examples.


Key-based Incremental Replication

Key-based Incremental Replication is a replication method in which Stitch identifies new and updated data using a column called a Replication Key. As this method will greatly reduce latency and data usage, we highly recommend using it where ever possible.

To identify new and updated data for replication, Stitch uses what is called a Replication Key.

Replication Keys

When Stitch replicates your data, it will store the last recorded maximum value in the Replication Key column and compare it against the data source - not what’s in your data warehouse - to identify new/updated data.

Any row with a Replication Key value greater than or equal to the stored value is where Stitch will begin the next replication job.

Updated At Incremental Replication

The updated_at method of Incremental Replication uses a timestamp or datetime column (set as the Replication Key) that’s updated whenever information in a row is changed. Stitch will compare the last recorded MAX value in this column with what’s in the data source to identify new and updated data for replication.

This method works for tables that will both add new rows and update existing ones.

For example: the Replication Key for the customers table below is the updated_at column.

During the initial replication job, the following rows are replicated:

| id [PK] | name        | age | updated_at          |
|---------+-------------+-----+---------------------|
| 1       | Finn        | 14  | 2017-01-01 14:34:23 |
| 2       | Jake        | 6   | 2017-01-01 14:34:23 |
| 3       | Beamo       | 1   | 2017-01-01 14:34:23 |
| 4       | Bubblegum   | 16  | 2016-12-31 02:45:53 |

When the replication job completes, Stitch will store the MAX value in the updated_at column (2017-01-01 14:34:23) to identify data for replication.

After the completion of the initial job but before the next job starts, these changes are made in the data source:

| id [PK] | name        | age | updated_at          |
|---------+-------------+-----+---------------------|
| 1       | Finn        | 15  | 2017-01-02 15:00:18 |   // age is changed in existing row
| 2       | Jake        | 6   | 2017-01-01 14:34:23 |
| 3       | Beamo       | 1   | 2017-01-01 14:34:23 |
| 4       | Bubblegum   | 16  | 2016-12-31 02:45:53 |
| 5       | King        | 300 | 2017-01-02 15:10:42 |   // new row is added

When the next replication job begins, Stitch will compare the stored updated_at value of 2017-01-01 14:34:23 against what’s in the data source to select rows to replicate. Anything greater than or equal to this value will be selected.

During the next replication job, the following rows are replicated:

| id [PK] | name  | age | updated_at          |
|---------+-------+-----+---------------------|
| 1       | Finn  | 15  | 2017-01-02 15:00:18 |
| 2       | Jake  | 6   | 2017-01-01 14:34:23 |
| 3       | Beamo | 1   | 2017-01-01 14:34:23 |
| 5       | King  | 300 | 2017-01-02 15:10:42 |

Instead of replicating the entire table again, only the rows with Replication Key values that were greater than or equal to the last value saved by Stitch were replicated. Notice that the record for Bubblegum (4) was not selected for replication because of its Replication Key value.

While there may be a small amount of duplication when using this method (it’s to ensure Stitch doesn’t miss any data), it’s the most efficient way to use Stitch.

Append-Only Incremental Replication

Append-Only Replication is when only new data is added, or appended, to a table. Existing rows are never updated. If a value is changed in an existing record, a new row with the new data will be appended to the end of the table.

This means that there can be many different rows in a table with the same Primary Key, each representing what that record was at that moment in time. Stitch provides the _sdc_sequence column to help distinguish the age of the records.

Initial replication job

Source table

In the source, the initial table looks like this:

id name type magic updated_at
1 Finn human false 2017-12-22 12:34:23
2 Jake dog true 2018-07-17 16:28:14


Stitch saves the maximum value from the Replication Key column (updated_at: 2018-07-17 16:28:14) to as a ‘bookmark’, which is where replication will pick up on the next replication job.

Destination table

The entire table is replicated in full to the destination:

id name type magic updated_at
1 Finn human false 2017-12-22 12:34:23
2 Jake dog true 2018-07-17 16:28:14

Second replication job

Source table

Some time later, new rows (id: 3, 4) are added to the source table:

id name type magic updated_at
1 Finn human false 2017-12-22 12:34:23
2 Jake dog true 2018-07-17 16:28:14
3 Bubblegum princess true 2018-07-17 18:37:56
4 Beamo game bot false 2018-07-17 18:34:49


Destination table

Stitch replicates all records where updated_at >= 2018-07-17 16:28:14, in this case id: 2, 3, 4, and appends them to the end of the table in the destination:

id name type magic updated_at
1 Finn human false 2017-12-22 12:34:23
2 Jake dog true 2018-07-17 16:28:14
2 Jake dog true 2018-07-17 16:28:14
3 Bubblegum princess true 2018-07-17 18:37:56
4 Beamo game bot false 2018-07-17 18:34:49


Note that:

  • id: 2 is in the table twice. This is because of the inclusive nature of Replication Keys, where records that are greater than or equal to the last saved maximum value are replicated.
  • Rows for records id: 3, 4 are appended to the end of the table
  • The maximum Replication Key value is updated to updated_at: 2018-07-17 18:37:56

Third replication job

Source table

What happens when an existing row is updated? In the source, id: 1 is now magic: true, which changes the record’s updated_at to 2018-07-20 09:18:32:

id name type magic updated_at
1 Finn human true 2018-07-20 09:18:32
2 Jake dog true 2018-07-17 16:28:14
3 Bubblegum princess true 2018-07-17 18:37:56
4 Beamo game bot false 2018-07-17 18:34:49


Destination table

Stitch replicates all records where updated_at: 2018-07-17 18:37:56. In this case, that’s records id: 1, 3, which will be appended to the end of the table in the destination:

id name type magic updated_at
1 Finn human false 2017-12-22 12:34:23
2 Jake dog true 2018-07-17 16:28:14
2 Jake dog true 2018-07-17 16:28:14
3 Bubblegum princess true 2018-07-17 18:37:56
4 Beamo game bot false 2018-07-17 18:34:49
1 Finn human true 2018-07-20 09:18:32
3 Bubblegum princess true 2018-07-17 18:37:56


Because existing rows aren’t updated in the destination, records id: 1, 3 are now in the table twice. For id: 1, note that the magic values are different.

Deleted record handling and Key-based Incremental

Stitch is only able to capture soft-deleted records in tables using Key-based Incremental Replication. Refer to the Deleted Record Handling guide for an explanation and examples.


Full Table Replication

Full Table Replication means that Stitch will replicate the entire contents of a table on every replication attempt. As this Replication Method can cause latency and quickly use up your monthly row limit, it’s the most inefficient way to use Stitch.

We recommend using Incremental Replication if the table in question contains any timestamped or datetime columns.

Note Stitch does not currently support Full Table Replication for Mongo integrations.

Deleted record handling and Full Table

Depending on the method used to delete a record, Stitch may be able to capture the deleted record. Refer to the Deleted Record Handling guide for an explanation and examples.


Learn about SaaS integration Replication Methods

As previously mentioned, Replication Methods can currently only be defined for tables in database integrations.

To learn more about the Replication Methods used by SaaS integration tables, refer to the Schema section in the SaaS integration docs.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Tags: replication