For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. The integration’s schema is where all the data Stitch replicates from the data source will be stored.
Integration Schema Names
When you create an integration, you’re asked to provide a name for the integration. This name is used to create the integration’s schema in your data warehouse.
If you want the schema name to be different than the display name, click the Change link below the Integration Name field and enter a new name in the field that displays:
Note: After you save an integration, its schema name can’t be changed.
Changing & Re-using Schema Names
Changing an integration’s schema name requires you to create a new integration and fully re-replicate all historical data.
Schema names from deleted integrations can be re-used. However, if a naming collision occurs (two schema names canonicalize to the same name) the destination may reject the data - deleting an integration in the Stitch app won’t delete that integration’s schema or data from your data warehouse.
Integration Schema Composition
Schemas created by Stitch contain that integration’s tables and one additional table called
_sdc_rejected, which serves as a log for data loading issues.
_sdc_rejected, the tables that Stitch creates in integration schemas depends on the type of integration.
Database Integration Tables
For database integrations, the schema in your destination will mirror your data source and contain only the tables and columns you set to replicate.
There is one exception to this, however: MongoDB integrations don’t support whitelisting at the field level. All fields contained in a collection are automatically set to replicate when a collection is selected for replication.
SaaS Integration Tables
For SaaS integrations, the schema depends on whether the integration supports whitelisting:
- If whitelisting is supported, the schema will contain only the tables (and columns, for some integrations) you set to replicate. A full list of whitelist-enabled integrations can be found here.
- If whitelisting isn’t supported, all available tables and columns in the integration will be replicated to your data warehouse. You can find out more about the available tables and their Replication Methods in the Schema section of the SaaS Integration docs.
Integration Table Schemas
The integration tables in the schema contain the replicated data from tables set to replicate. Note: If you de-select a table from replication, doing so won’t remove that table’s data from your destination.
How your data is stored in the schemas and tables created by Stitch depends on a few things:
- How data is structured in that particular data source (ex: use of nested data structures),
- Any changes you might make to the data source (ex: adding/removing a column),
- Stitch-specific data handling rules (ex: entirely
- How your destination handles data (ex: columns with mixed data types, nested data structures)
Stitch will encounter dozens of scenarios when replicating and loading your data. Familiarizing yourself with these scenarios and the nuances of your destination will enable you to better understand your data’s structure and efficiently troubleshoot if issues arise.
To learn more about how handles these scenarios, check out the Data Loading guide for your destination.
In addition to the columns set to replicate in these tables, there are also a few columns prepended with
_sdc. Stitch uses these columns to replicate your data.
Don’t remove these columns, as doing so will cause replication issues in Stitch.
Timestamp indicating when Stitch received the record for loading.
Timestamp indicating when Stitch loaded the batch the record was a part of into the data warehouse.
A Unix epoch (in miliseconds) that indicates the order in which data points were considered for loading.
Indicates the version of the table. This column is used to complete a switch-view operation, when necessary.
Applicable only to subtables, this column forms part of a composite Primary Key for the table. It contains the top level table’s Primary Key and will also inherit the Primary Key’s data type.
For example: there is a top-level table (
Applicable only to subtables, this column forms part of a composite Primary Key for the table and can be used to associate further nested records to the parent.
Values in this column will auto-increment for each unique record in the table, beginning with 0. When used with the
Depending on the level of nesting, there may be multiple
This column acts as a Primary Key if Stitch doesn’t detect one. Stitch will use this column to de-dupe data.
Applicable only to database integrations, this column contains an auto-generated ID that Stitch injects when a Primary Key isn’t detected. Stitch will use this column to de-dupe data.
Applicable only to Google AdWords integrations, this column contains the AdWords Account ID the data in the row pertains to.
For example: the
Applicable only to Google AdWords inetgrations, this column indicates the starting time of the Stitch job that extracted the relevant row’s data.
Rejected Records Log
Occasionally, Stitch will encounter data that it can’t load into your destination. For example: a table contains more columns than the destination’s supported limit.
On your end, this will usually look like you’re missing data. When Stitch is unable to load data, however, the occurrence will be logged in a table named
_sdc_rejected. Every integration schema created by Stitch will include this table as well as the other tables in the integration.
_sdc_rejected table acts as a log of data rejections and includes the reason, as much of the record that can fit, and the date of the occurrence.
This rejection log can be useful for investigating data discrepancies and troubleshooting errors surfaced during the data loading process.
To learn more about this table, check out the Rejected Records Log guide.