Full Table Replication is a replication method in which all rows in a table - including new, updated, and existing - are replicated during every replication job. In this guide, we’ll cover:

  1. How it works (with examples),
  2. When it should be used, and
  3. Limitations of this Replication Method

How Full Table Replication works

Tables that use Full Table Replication are replicated in full during each replication job. Regardless of whether a record is new or simply modified, all records in the table will be selected for extraction.

If Full Table Replication were a SQL query, it would look like this:

SELECT column_you_selected_1,
       column_you_selected_2,
       [...]
  FROM schema.table

When Full Table Replication should be used

Full Table Replication may be a good fit if:

  1. Records are hard deleted from the source.
  2. The table doesn’t contain a suitable column for Key-based Incremental Replication.
  3. Log-based Incremental Replication is unavailable for the source.
  4. For MongoDB-backed database integrations: The _id field contains only one data type. Refer to the Limitations section for more info.

Limitations of Full Table Replication

Before you select Full Table Replication as the Replication Method for a table, you should be aware of the limitations this method can have. Being aware of these limitations can help prevent data discrepancies and ensure your data is replicated in the most efficient manner possible.

The limitations of Full Table Replication are:

Limitation 1: Can cause latency

How large a source table is - that is, how many records the table contains - can affect how quickly Stitch is able to extract data from a source.

In the case of large tables using Full Table Replication, Stitch can only extract data as quickly as it is returned. This means that if a database or SaaS application returns data slowly, especially for a large table, latency in the replication process may increase. This is more probable with tables using Full Table Replication.

Limitation 2: Increased row consumption

Tables using Full Table Replication are replicated fully during every replication job, regardless of whether individual records were updated or not.

The more records a table contains, the more rows that will count towards usage. When paired with a high Replication Frequency, a single table can quickly consume an entire month’s row quota.

For example: A table contains 10,000 records and is using Full Table Replication. The integration’s Replication Frequency is every 30 minutes. The table below shows the number of rows replicated for the table per job as well as the total number used since the first job:

Job name Start time Rows replicated this job Total rows replicated
Job 1 00:00 10,000 10,000
Job 2 00:30 10,000 20,000
Job 3 01:00 10,000 30,000
Job 4 01:30 10,000 40,000
Job 5 02:00 10,000 50,000

If the integration were to continue replicating every 30 minutes until 11:59:59, this table would use 480,000 rows in 24 hours. Depending on the Stitch plan you’re using, this type of usage can quickly use up your row allotment.

Limitation 3: Unavailable for some integrations

Currently, Full Table Replication is unavailable for version v11-01-2016 of Stitch’s MongoDB integration. This version of MongoDB only supports Key-based Incremental Replication.

Full Table Replication is supported for all other versions of Stitch’s database and SaaS integrations.

Limitation 4: Multiple data types in the _id field can cause discrepancies (MongoDB)

Full Table Replication works a little differently for MongoDB-backed database integrations. For MongoDB, Stitch uses the _id field in MongoDB collections as a pseudo-Replication Key.

During the replication job, Stitch first determines the current maximum value of the _id column. Next, Stitch queries for documents with _id values that are less than or equal to the current maximum _id value.

For example: A query for Full Table Replication for a MongoDB collection would look like this:

SELECT field_you_selected_1,
       field_you_selected_2,
       [...]
  FROM schema.collection
 WHERE _id <= [max _id value]

The MongoDB integration functions this way to ensure that replication for the table can resume if a replication job is interrupted or doesn’t finish before the extraction job time limit.

Because MongoDB ranks BSON data types, this affects how the maximum _id value is determined. As a result, if multiple data types are present in the _id column, discrepancies may occur. Refer to the Missing Mongo data due to multiple data types guide for more info and examples.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.