Depending on the Replication Method being used and how records are deleted in the source, deletes may not be captured during the replication process.


Deletion methods

There are two methods that can be used to delete a source record:

  • Soft deletes, which will leave a record in the source and use a flag to indicate deletion, such as is_deleted or deleted_on. If the delete event updates the record’s Replication Key value, Stitch will detect and replicate the changes.
  • Hard deletes, which completely remove records from the source. It’s as if the record never existed. If using Key-based Incremental Replication, this will remove the record’s Replication Key value, which Stitch uses to identify new and updated records. Without a Replication Key value to check, Stitch can’t identify the change and update the record in the destination.

Delete support overview

In the table below are each of Stitch’s Replication Methods and the level at which each deletion method is supported.

Click the Replication Method name to check out examples of how each deletion method works with that specific Replication Method.

Replication Method Hard deletes Soft deletes
Log-based Incremental Sometimes supported

In the destination: Record will have a timestamp value in the _sdc_deleted_at column.

Note: Certain record event types may not be captured. See the Log-based Incremental section for more info and examples.

Supported

In the destination: Record will have an updated deleted flag value, if a flag is available in the source.

Key-based Incremental Not supported

In the destination: Record will remain in the destination table.

Supported

In the destination: Record will have an updated deleted flag value, if a flag is available in the source.

Full Table Sometimes supported

In the destination: Record may remain or be removed. See the Full Table section for more info and examples.

Supported

In the destination: Record will have an updated deleted flag value, if a flag is available in the source.


Log-based Incremental Replication

Click the tabs below to view examples for how each deletion method is handled using Log-based Incremental Replication.

Soft deletes with Log-based Incremental

In this example, we’ll demonstrate what happens when two recrds are soft-deleted in a source table using Log-based Incremental.

Soft delete: Initial job

During the initial job:

  • In the source: Stitch uses SELECT replication to replicate the table in full.

  • In the destination: Stitch loads the table in full.

Source
id name is_deleted
1 Finn false
2 Jake false
3 Bubblegum false
Destination
id name is_deleted
1 Finn false
2 Jake false
3 Bubblegum false
Soft delete: Second job

During the second job:

  • In the source: Records 1 and 3 are soft deleted (is_deleted: true).

    After the initial replication job, Stitch will switch to using the database’s binary log files for replication and its position in the binary log as a Replication Key, or bookmark.

    The binary log will contain an entry for updating the is_deleted values for records 1 and 3, which Stitch will detect and replicate.

  • In the destination: The is_deleted and updated_at values are updated for records records 1 and 3.

Source
id name is_deleted
1 Finn true
2 Jake false
3 Bubblegum true
Destination
id name is_deleted
1 Finn true
2 Jake false
3 Bubblegum true
Hard deletes with Log-based Incremental

In this example, we’ll demonstrate what happens when a record is hard deleted - or removed entirely - from a source table using Log-based Incremental. In this situation, Stitch will use a TIMESTAMP column named _sdc_deleted_at to indicate when records have been identified as deleted in the source.

Note: Identifying hard deleted records is dependent on how the record is deleted, as not all event types are recorded by databases in binary logs. For example: If a record is deleted using TRUNCATE in a MySQL database, Stitch will not detect the deletion.

The following event types will be detected by Log-based Incremental for MySQL- and PostgreSQL-backed database integrations:

  • INSERT
  • UPDATE
  • DELETE
Hard delete: Initial job

During the initial job:

  • In the source: Stitch uses SELECT replication to replicate the table in full.

  • In the destination: Stitch loads the table in full.

Source
id name
1 Finn
2 Jake
3 Bubblegum
Destination
id name _sdc_deleted_at
1 Finn
2 Jake
3 Bubblegum
Hard delete: Second job

During the second job:

  • In the source: Record 2 is hard deleted.

    After the initial replication job, Stitch will switch to using the database’s binary log files for replication and its position in the binary log as a Replication Key, or bookmark.

    If the record is deleted using a command that is recorded by the binary log, the log file will contain an entry for record 2’s deletion. Stitch will detect and replicate the updated record.

  • In the destination: Record 2 will remain in the table, but have a timestamp in the _sdc_deleted_at column indicating when the deletion took place.

Source
id name
1 Finn
2 Jake
3 Bubblegum
Destination
id name _sdc_deleted_at
1 Finn
2 Jake 2018-07-28 04:05:00
3 Bubblegum

Key-based Incremental Replication

Click the tabs below to view examples for how each deletion method is handled using Key-based Incremental Replication.

Soft deletes with Key-based Incremental

In this example, we’ll demonstrate what happens when two records are soft-deleted in the source. This table uses the is_deleted column as a deletion flag, which will be true if the record is deleted.

Soft delete: Initial job

During the initial job:

  • In the source: Stitch replicates the entire table, saving updated_at: 2018-07-27 12:07:00 as the maximum Replication Key value.

  • In the destination: Stitch loads the entire table in full.

Source
id updated_at is_deleted
1 2018-07-27 12:00:00 false
2 2018-07-27 12:05:00 false
3 2018-07-27 12:07:00 false
Destination
id updated_at is_deleted
1 2018-07-27 12:00:00 false
2 2018-07-27 12:05:00 false
3 2018-07-27 12:07:00 false
Soft delete: Second job

During the second job:

  • In the source: Records 1 and 3 are soft deleted (is_deleted: true). The Replication Key value (updated_at) is updated.

    Stitch replicates all records with an updated_at value >= 2018-07-27 12:07:00.

  • In the destination: The is_deleted and updated_at values are updated for records 1 and 3.

Source
id updated_at is_deleted
1 2018-07-28 04:00:00 true
2 2018-07-27 12:05:00 false
3 2018-07-28 04:05:00 true
Destination
id updated_at is_deleted
1 2018-07-28 04:00:00 true
2 2018-07-27 12:05:00 false
3 2018-07-28 04:05:00 true
Hard deletes with Key-based Incremental

In this example, we’ll demonstrates what happens when a record is hard-deleted - or removed entirely - from the source.

Note: To remove hard deletes from destination tables replicated in this way, you’ll need to:

  1. Temporarily drop the table in the destination
  2. Reset the table’s Replication Keys
  3. Wait for Stitch to re-create the table during the next replication job

These steps must be done in this order, or you may cause issues with replication.

Hard delete: Initial job

During the initial job:

  • In the source: Stitch replicates the entire table, saving updated_at: 2018-07-27 12:07:00 as the maximum Replication Key value.

  • In the destination: Stitch loads the entire table in full.

Source
id updated_at
1 2018-07-27 12:00:00
2 2018-07-27 12:05:00
3 2018-07-27 12:07:00
Destination
id updated_at
1 2018-07-27 12:00:00
2 2018-07-27 12:05:00
3 2018-07-27 12:07:00
Hard delete: Second job

During the second job:

  • In the source: Before the second job begins, record 2 is hard deleted and removed entirely from the source. Records 1 and 3 are updated.

    During the job, Stitch replicates all records with an updated_at value >= 2018-07-27 12:07:00.

  • In the destination: The updated_at values for records 1 and 3 are updated.

    Record 2 remains in the destination table.

Source
id updated_at
1 2018-07-28 04:15:00
3 2018-07-28 04:07:00
Destination
id updated_at
1 2018-07-28 04:15:00
2 2018-07-27 12:05:00
3 2018-07-28 04:07:00

Full Table Replication

Click the tabs below to view examples for how each deletion method is handled using Full Table Replication.

Soft deletes with Full Table

In this example, we’ll demonstrate what happens when two records are soft-deleted in the source. This table uses the is_deleted column as a deletion flag, which will be true if the record is deleted.

Soft delete: Initial job

During the initial job:

  • In the source: Stitch replicates the entire table.

  • In the destination: Stitch loads the entire table in full.

Source
id is_deleted
1 false
2 false
3 false
Destination
id is_deleted
1 false
2 false
3 false
Soft delete: Second job

During the second job:

  • In the source: Records 1 and 3 are soft deleted (is_deleted: true). Stitch replicates the entire table.

  • In the destination: Stitch overwrites the table in the destination, resulting in the is_deleted values for records 1 and 3 being updated.

Source
id is_deleted
1 true
2 false
3 true
Destination
id is_deleted
1 true
2 false
3 true
Hard deletes with Full Table

In this example, we’ll demonstrates what happens when a record is hard-deleted - or removed entirely - from the source.

Hard delete: Initial job

During the initial job:

  • In the source: Stitch replicates the entire table.

  • In the destination: Stitch loads the table in full.

Source
id name
1 Finn
2 Jake
3 Bubblegum
Destination
id name
1 Finn
2 Jake
3 Bubblegum
Hard delete: Second job

During the second job:

  • In the source: Before the second job begins, record 2 is hard deleted and removed entirely from the source.

    Stitch replicates the table in full.

  • In the destination: Stitch deletes and recreates the table in a single transaction, loading the replicated table in full.

Source
id name
1 Finn
3 Bubblegum
Destination
id name
1 Finn
3 Bubblegum
Hard deletes with Full Table in BigQuery

In this example, we’ll demonstrates what happens when a record is hard-deleted - or removed entirely - from the source.

How hard deletes are handled in BigQuery depends on the integration:

  • If you can define Replication Methods for an integration, Stitch will delete and re-create tables in a single transaction. This is the scenario described in the Hard deletes tab for this section.
  • Otherwise, Stitch will append the table in full to the destination table. This is the scenario we’ll outline here.
Hard delete in BigQuery: Initial job

During the initial job:

  • In the source: Stitch replicates the entire table.

  • In the destination: Stitch loads the table in full.

Source
id name
1 Finn
2 Jake
3 Bubblegum
Destination
id name
1 Finn
2 Jake
3 Bubblegum
Hard delete in BigQuery: Second job

During the second job:

  • In the source: Before the second job begins, record 2 is hard deleted and removed entirely from the source.

    Stitch replicates the table in full.

  • In the destination: Stitch appends the replicated data to the end of the table in the destination. In this case, records 1 and 3 would be appended.

Source
id name
1 Finn
3 Bubblegum
Destination
id name
1 Finn
2 Jake
3 Bubblegum
1 Finn
3 Bubblegum


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Tags: replication