Learn how Stitch will load data from your integrations into version 2 of Stitch’s Google BigQuery destination.

In this guide, we’ll cover data loading scenarios involving:


Primary Key scenarios

Scenarios involving Primary Key columns.

IF

A table without a Primary Key is replicated.

THEN

Table is created with an __sdc_primary_key column as the Primary Key. Data for the table will be loaded in an Append-Only manner, regardless of the loading behavior selected for the destination.

The Primary Key information for the table is stored in the _sdc_primary_keys table in the integration’s schema.

Refer to the Understanding loading behavior guide for more info and examples.

IF

A table with a single Primary Key is replicated.

THEN

Table is created.

The Primary Key information for the table is stored in the _sdc_primary_keys table in the integration’s schema.

IF

A table with multiple Primary Keys is replicated.

THEN

Table is created with an __sdc_primary_key column as the Primary Key. Data for the table will be loaded in an Append-Only manner, regardless of the loading behavior selected for the destination.

The Primary Key information for the table is stored in the _sdc_primary_keys table in the integration’s schema.

Refer to the Understanding loading behavior guide for more info and examples.

IF

The table’s Primary Key(s) is/are changed.

THEN

Changing a table’s Primary Key(s) is not permitted in Google BigQuery. This includes changing the Primary Key(s) in the source, or adding a Primary Key to a table that didn’t previously have one.

If Primary Key columns are changed, Stitch will stop processing data for the table.

AND

The following error will display in the Notifications tab in Stitch:

Primary key change is not permitted
FIX IT

Re-instate the table’s Primary Key(s) to allow Stitch to continue processing data for the table.

IF

You remove the Primary Key column(s) for a table in Google BigQuery.

THEN

Changing a table’s Primary Key(s) is not permitted in Google BigQuery.

If Primary Key columns are changed, Stitch will stop processing data for the table.

AND

The following error will display in the Notifications tab in Stitch:

Primary key change is not permitted
FIX IT

Re-instate the table’s Primary Key(s) to allow Stitch to continue processing data for the table.

Back to top


Replication Key scenarios

Scenarios involving Replication Keys and how data is loaded as a result.

IF

A table using Key-based Incremental Replication is replicated where the Replication Key column contains NULL values.

THEN
  • During the initial job, the table will be created and all rows will be replicated.
  • During subsequent jobs, only rows with populated Replication Keys will be replicated and persisted to Google BigQuery.

Back to top


Object naming scenarios

Scenarios involving object identifiers in the destination, including naming limitations and transformations.

IF

A table name contains more characters than allowed by Google BigQuery.

THEN

Google BigQuery will reject all data for the table.

AND

The following error will display in the Notifications tab in Stitch:

Table name [TABLE] is too long for Google BigQuery

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

If possible, change the table name in the source to be less than Google BigQuery’s character limit of 1,024 characters.

Use the _sdc_rejected table to identify the root of the issue.

IF

A column name contains more characters than allowed by Google BigQuery.

THEN

Google BigQuery will reject columns with names that exceed the column character limit. Other columns in the table will persist to Google BigQuery.

AND

The following error will display in the Notifications tab in Stitch:

Column name [COLUMN] is too long for Google BigQuery

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

If possible, change the column name in the source to be less than Google BigQuery’s character limit of 128 characters.

Use the _sdc_rejected table to identify the root of the issue.

IF

Two columns are replicated that canonicalize to the same name.

THEN

For example: a table containing both CustomerId and customerid columns.

Google BigQuery will reject the records and create a log for the rejected records in the _sdc_rejected table in that integration’s schema.

AND

The following error will display in the Notifications tab in Stitch:

Field collision on [COLUMN_NAME]

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

If possible, re-name one of the columns in the source so that both column names will be unique when replicated to Google BigQuery.

Use the _sdc_rejected table to identify the root of the issue.

IF

A column is replicated that has a mixed-case name.

THEN

Google BigQuery will convert letters to lowercase. For example:

Columns in Source Columns in Google BigQuery
CuStOmErId customerid
customerID customerid
IF

A column is replicated that has a name with spaces.

THEN

Google BigQuery will convert spaces to undersocres. For example:

Columns in Source Columns in Google BigQuery
customer id customer_id
CUSTOMER ID customer_id
IF

A column is replicated with a name that contains unsupported special characters.

THEN

Google BigQuery will convert special characters to underscores. For example:

Columns in Source   Columns in Google BigQuery
customer!id   customer_id
!CUSTOMERID   _customerid
IF

A column is replicated with a name that begins with a non-letter.

THEN

Google BigQuery will remove all leading non-letter characters with the exception of leading underscores. For example:

Columns in Source   Columns in Google BigQuery
123customerid   _customerid
_customerid   _customerid
_987CUSTOMERID   _987CUSTOMERID

Back to top


Table scenarios

Scenarios involving table creation and modification in the destination.

IF

A table contains entirely NULL columns.

THEN

No table is created in Google BigQuery. At least one column must have a non-NULL value for Stitch to create a table in Google BigQuery.

IF

A table arrives with more columns than Google BigQuery allows.

THEN

Google BigQuery will reject all data for the table.

AND

The following error will display in the Notifications tab in Stitch:

ERROR: too many columns

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

If possible, deselect some columns to allow Stitch to load data into Google BigQuery for the table. Google BigQuery has a limit of columns per table.

Use the _sdc_rejected table to identify the root of the issue.

Back to top


Data typing scenarios

Scenarios involving various data types, including how data is typed and structured in the destination.

IF

Stitch detects multiple data types for a single column.

THEN

To accommodate data of varying types, Stitch will create multiple columns to ensure data is loaded with the correct type. In the destination, this will look like the column has been “split”.

For example: Stitch first detected that order_confirmed contained BOOLEAN data, but during a subsequent job, detected STRING values. To accommodate data of varying types, Stitch will:

  1. Store data for the original data type in the original column. In this example, only BOOLEAN values will be stored in order_confirmed. The name of the original column will not change.

  2. Create additional columns to store the other data types - one for each data type detected - and append the data type to the column name. In this example, a order_confirmed__st column will be created to store STRING values.

Refer to TODO for more info and examples.

IF

Data is replicated to Google BigQuery that is nested, containing many top-level properties and potentially nested sub-properties.

THEN

Nested records and objects are maintained. Refer to the Nested data structures in Google BigQuery guide for more info and examples.

IF

A VARCHAR column is replicated to Google BigQuery.

THEN

Google BigQuery will store all VARCHAR data as STRING.

IF

VARCHAR data is loaded that exceeds the current maximum size for the column.

THEN

No widening will occur. Google BigQuery will store all VARCHAR data as STRING.

IF

A column containing date data with timezone info is replicated to Google BigQuery.

THEN

Google BigQuery has no support for timezones.

IF

A column contains timestamp data that is outside Google BigQuery’s supported range.

THEN

Google BigQuery will reject the records that fall outside the supported range.

AND

The following error will display in the Notifications tab in Stitch:

timestamp out of range for Google BigQuery on [TIMESTAMP]

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

To resolve the error, offending values in the source must be changed to be within Google BigQuery’s timestamp range.

Use the _sdc_rejected table to identify the root of the issue.

IF

A column contains integer data that is outside Google BigQuery’s supported range.

THEN

Google BigQuery will reject the records that fall outside the supported range.

AND

The following error will display in the Notifications tab in Stitch:

integer out of range for Google BigQuery on [INTEGER]

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

To resolve the error, offending values in the source must be changed to be within Google BigQuery’s limit for integers.

Use the _sdc_rejected table to identify the root of the issue.

IF

A column contains decimal data.

THEN

Decimal values will be loaded to Google BigQuery as the data type NUMERIC.

IF

A column contains decimal data that is outside Google BigQuery’s supported range.

THEN

Google BigQuery will reject the records that fall outside the supported maximum range for the NUMERIC data type.

AND

The following error will display in the Notifications tab in Stitch:

decimal out of range for Google BigQuery on [DECIMAL]

Rejected records will be logged in the _sdc_rejected table of the integration's schema. Learn more.

FIX IT

To resolve the error, offending values in the source must be changed to be within Google BigQuery’s limit for decimals.

Use the _sdc_rejected table to identify the root of the issue.

Back to top


Schema change scenarios

Scenarios involving schema changes in the source or structural changes in the destination.

IF

A new column is added in table already set to replicate.

THEN

If the column has at least one non-NULL value in the source, the column will be created and appended to the end of the table in Google BigQuery.

Note: If the table using either Key- or Log-based Incremental Replication, backfilled values for the column will only be replicated if:

  1. The records’ Replication Key values are greater than or equal to the last saved maximum Replication Key value for the table, or
  2. The table is reset and a historical re-replication is queued.

Additionally, how records with new column values are loaded depends on the selected loading behavior:

  • Upsert: Existing records will be updated with the new column’s values, if the table has a defined Primary Key column. If the table uses either Key- or Log-based Incremental Replication, you may need to reset the table to backfill historical values for previously replicated records.

    If the table doesn’t have a Primary Key, data will be loaded in an Append-Only manner.

  • Append-Only: Existing records will not be updated with the new column’s values. Instead, records with new column values will be appended to the end of the table.

Refer to the Understanding loading guide for more info and examples.

IF

A new column is added by you to a Stitch-generated table in Google BigQuery.

THEN

Columns may be added to tables created by Stitch as long as they are nullable, meaning columns don’t have NOT NULL constraints.

IF

A column is deleted at the source.

THEN

How a deleted column is reflected in Google BigQuery depends on the Replication Method used by the table:

  • Key-based Incremental: The column will remain in the destination, and default NULL values will be placed in it going forward.

  • Log-based Incremental: Changes to a source table - including adding or removing columns, changing data types, etc. - require manual intervention before replication can continue. Refer to the Log-based Incremental Replication documentation for more info.

  • Full Table: The column will remain in the destination, and default NULL values will be placed in it going forward.

IF

You remove a column from a Stitch-replicated table in your destination.

THEN

The result of deleting a column from a Stitch-generated table depends on the type of column being removed:

  • Primary Key columns: Changing a table’s Primary Key(s) is not permitted in Google BigQuery. If Primary Key columns are changed, Stitch will stop processing data for the table.

  • General columns: If new data is detected for the removed column, Stitch will re-create it in Google BigQuery. This refers to all columns that are not prepended by _sdc or suffixed by a data type. For example: customer_zip, but not customer_zip__st.

    Note: An integration must support selecting columns AND you must deselect the column in Stitch for the column removal to be permanent.

  • _sdc columns: Removing a Stitch replication column will prevent Stitch from loading replicated data into Google BigQuery.

  • Columns with data type suffixes: Removing a column created as result of accommodating multiple data types will prevent Stitch from loading replicated data into the table. This applies to columns with names such as: customer_zip__st, customer_zip__int, etc.

Back to top


Destination changes

Scenarios involving modifications made to the destination, such as the application of workload/performance management features or user privilege changes.

IF

Indices are applied to Stitch-generated columns in the destination.

THEN

Stitch will respect the index application.

IF

Partitioning is applied to Stitch-generated tables in the destination.

THEN

Stitch will respect the partitioning application. Refer to the Apply table partitioning and clustering guide for more info and instructions.

IF

Clustering is applied to Stitch-generated tables in the destination.

THEN

Stitch will respect the cluster application. Refer to the Apply table partitioning and clustering guide for more info and instructions.

IF

You switch to a different destination of the same type.

THEN

This means the destination type is still Google BigQuery, Stitch may just be connected a different database in Google BigQuery.

  • For tables using Key-based or Log-based Incremental Replication, replication will continue using the Replication’s Key last saved maximum value. To re-replicate historical data, resetting Replication Keys is required.
  • For tables using Full Table Replication, the table will be fully replicated into the new destination during the next successful job.
  • For webhook integrations, some data loss may be possible due to the continuous, real-time nature of webhooks. Historical data must either be backfilled or re-played.

Back to top



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.