Append-Only Replication is a type of Incremental Replication where newly replicated data is appended to the end of a table. Existing rows are not updated - any updates will be added to the table as new rows.

For tables using Incremental Replication, Stitch currently loads data into Google BigQuery in an append-only fashion. This means that as time goes on, tables will wind up containing many different versions of the same row.

Data stored this way can provide insights and historical details about how those rows have changed over time - creating a timeline of the status changes of an order record, for example - but in some cases, you might just want the latest version of the table.

Grabbing the Latest Version of Every Row

In each Stitch-generated integration table, you’ll see a few columns prepended with _sdc. The column we’ll focus on here is the _sdc_sequence column. This column is a Unix epoch (down to the milisecond) attached to the record during replication and can help determine the order of all the versions of a row.

Stitch uses these sequence values in a few places to correctly order rows for loading, but it can be also used to grab the latest version of a record in an append-only table.

Let’s take a look at an example. Assume we have an orders table that contains:

  • A Primary Key of id,
  • The system _sdc columns added by Stitch, and
  • A whole lot of other order attribute columns.

If you wanted to create a snapshot of the latest version of this table, you could run a query like this:

FROM [stitch-analytics-bigquery-123:ecommerce.orders] o
        MAX(_sdc_sequence) AS seq,
    FROM [stitch-analytics-bigquery-123:ecommerce.orders]
    GROUP BY id ) oo
ON =
AND o._sdc_sequence = oo.seq

This approach uses a subquery to get a single list of every row’s Primary Key and maximum sequence number. Since it’s possible to have duplicate records in your warehouse, the query also selects only distinct records of the latest version of the row. It then joins the original table to both the Primary Key and maximum sequence, which makes all other column values available for querying.

Creating Views

To make this easier, you can turn queries like the one above into a view. We recommend this approach because a view will encapsulate all the logic and simplify the process of querying against the latest version of your data.

For more info on creating views in the data warehouses Stitch supports, check out these docs:

Questions? Feedback?

Did this article help? If you have questions or feedback, please reach out to us.