For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. The integration’s schema is where all the data Stitch replicates from the data source will be stored.


Integration Schema Names

When you create an integration, you’re asked to provide a name for the integration. This name is used to create the integration’s schema in your data warehouse.

Naming an integration

If you want the schema name to be different than the display name, click the Change link below the Integration Name field and enter a new name in the field that displays:

Using different names for schemas & display names.

Note that after you save an integration, its schema name can’t be changed.

Changing & Re-using Schema Names

Changing an integration’s schema name requires you to create a new integration and re-sync any replicated data.

Schema names from deleted integrations can be re-used. However, if a naming collision occurs (two schema names canonicalize to the same name) the destination may reject the data - deleting an integration in the Stitch app won’t delete that integration’s schema or data from your data warehouse.


Integration Schema Composition

Schemas created by Stitch contain that integration’s tables and one additional table called ``, which serves as a log for data loading issues.

The Stitch schema structure.

Aside from ``, the tables that Stitch creates in integration schemas depends on the type of integration.

Database Integration Tables

For database integrations, the schema in your destination will mirror your data source and contain only the tables and columns you set to sync.

There is one exception to this, however: MongoDB integrations don’t support whitelisting at the field level. All fields contained in a collection are automatically set to sync when a collection is selected for replication.

SaaS Integration Tables

For SaaS integrations, the schema depends on whether the integration supports whitelisting:

  • If whitelisting is supported, the schema will contain only the tables (and columns, for some integrations) you set to sync. A full list of whitelist-enabled integrations can be found here.
  • If whitelisting isn’t supported, all available tables and columns in the integration will be synced to your data warehouse. You can find out more about the available tables and their Replication Methods in the Schema section of the SaaS Integration docs.

Integration Table Schemas

The integration tables in the schema contain the replicated data from tables set to sync. Note that if you unsync a table, doing so won’t remove that table’s data from your data warehouse.

Data Storage

How your data is stored in the schemas and tables created by Stitch depends on a few things:

  • How data is structured in that particular data source (ex: use of nested data structures),
  • Any changes you might make to the data source (ex: adding/removing a column),
  • Stitch-specific data handling rules (ex: entirely NULL columns), and
  • How your destination handles data (ex: columns with mixed data types, nested data structures)

Stitch will encounter dozens of scenarios when replicating and loading your data. Familiarizing yourself with these scenarios and the nuances of your destination will enable you to better understand your data’s structure and efficiently troubleshoot if issues arise.

To learn more about how handles these scenarios, check out the Data Loading guide for your destination.

_sdc Columns

In addition to the columns set to sync in these tables, there are also a few columns prepended with _sdc. Stitch uses these columns to replicate your data.

Don’t remove these columns, as doing so will cause replication issues in Stitch.

_sdc_received_at
TIMESTAMP

Timestamp indicating when Stitch received the record for loading.

_sdc_batched_at
TIMESTAMP

Timestamp indicating when Stitch loaded the batch the record was a part of into the data warehouse.

_sdc_sequence
INTEGER

A Unix epoch (in miliseconds) that indicates the order in which data points were considered for loading.

_sdc_table_version
INTEGER

Indicates the version of the table. This column is used to complete a switch-view operation, when necessary.

_sdc_source_key_[pk-name]
VARIES

Applicable only to subtables, this column forms part of a composite Primary Key for the table. It contains the top level table’s Primary Key and will also inherit the Primary Key’s data type.

For example: there is a top-level table (orders) with a Primary Key of order_id (INTEGER). If a subtable were created from orders, this column would be named _sdc_source_key_order_id, be typed as an INTEGER, and contain the same values.

_sdc_level_0_id
INTEGER

Applicable only to subtables, this column forms part of a composite Primary Key for the table and can be used to associate further nested records to the parent.

Values in this column will auto-increment for each unique record in the table, beginning with 0. When used with the _sdc_primary_key column(s), a unique identifier for the row is created.

Depending on the level of nesting, there may be multiple _sdc_level_id columns in a subtable.

_sdc_primary_key
STRING

This column acts as a Primary Key if Stitch doesn’t detect one. Stitch will use this column to de-dupe data.

_sdc_replication_id
STRING

Applicable only to database integrations, this column contains an auto-generated ID that Stitch injects when a Primary Key isn’t detected. Stitch will use this column to de-dupe data.

_sdc_customer_id
STRING

Applicable only to Google AdWords integrations, this column contains the AdWords Account ID the data in the row pertains to.

For example: the ad_performance_report table may contain data from several AdWords accounts. The value in the _sdc_customer_id column allows you to associate records with the correct AdWords account.

_sdc_report_datetime
DATE-TIME

Applicable only to Google AdWords inetgrations, this column indicates the starting time of the Stitch job that extracted the relevant row’s data.


Rejected Records Log

Occasionally, Stitch will encounter data that it can’t load into your destination. For example: a table contains more columns than the destination’s supported limit.

On your end, this will usually look like you’re missing data. When Stitch is unable to load data, however, the occurrence will be logged in a table named _sdc_rejected. Every integration schema created by Stitch will include this table as well as the other tables in the integration.

The _sdc_rejected table acts as a log of data rejections and includes the reason, as much of the record that can fit, and the date of the occurrence.

This rejection log can be useful for investigating data discrepancies and troubleshooting errors surfaced during the data loading process.

To learn more about this table, check out the Rejected Records Log guide.



Questions? Feedback?

Did this article help? If you have questions or feedback, please reach out to us.

Tags: replication