From time to time, Stitch will encounter data that can’t be loaded losslessly into the destination table in your data warehouse. When this happens, Stitch may have to alter the structure of the table in order to successfully load the data.

This can happen for a few reasons:

  • Columns have been added to or removed from the table,
  • Columns have been removed from the table, or
  • Multiple data types exist in a single column

In this guide, you’ll find examples of how each destination handles each of these scenarios.


Adding & Removing Columns

  • If a new column is added to the source table, that column will be appended to the end of the table in the data warehouse.
  • If a column is removed from the source table, default NULLs will be placed into the column going forward. Stitch does not remove columns from the data warehouse.

Widening VARCHAR Columns

Depending on the data warehouse you’re using, VARCHAR columns of varying sizes can result in Stitch taking different actions to accommodate the data.

Click a tab to see how each destination handles this scenario.

In BigQuery, no widening will occur as all VARCHAR data is stored as TEXT.

In Panoply destinations, when a VARCHAR column increases in size Stitch will automatically widen the column to accommodate the data. To do this, Stitch must drop and re-create the column in your data warehouse, which will require temporarily dropping dependent views.

Let’s say you have an orders table in an integration called rep_sales. During the structure sync, Stitch will create a schema called rep_sales in your data warehouse and inside this schema, a table called orders.

The table is structured as follows, excluding the _sdc columns:

| Column           | Data Type    |
|------------------+--------------+
| id [pk]          | BIGINT       |
| rep_name         | VARCHAR(128) |
| order_amount     | BIGINT       |
| order_confirmed  | BOOLEAN      |

During the first sync, the following rows are replicated:

| id [pk] | rep_name    | order_amount | order_confirmed |
|---------+-------------+--------------+-----------------|
| 1       | Marty McFly | 12           | true            |
| 2       | Doc Brown   | 10           | true            |

During the next sync, the following rows are replicated:

| id [pk] | rep_name                         | order_amount | order_confirmed |
|---------+----------------------------------+--------------+-----------------|
| 3       | MartyMcFlyMessedUpTheTime        | 13           | true            |
|         | ContinuumAndEliminatedHisOwn     |              |                 |
|         | ExistenceAndNowThereAren’tAny    |              |                 |
|         | HoverboardsInAnAwesomeHitSequal  |              |                 |
|         | WhichIsBothWittyAndFuturistic    |              |                 |
| 4       | Doc Brown                        | 11           | true            |

Take a look at the rep_name column for row 3 - it definitely contains data that exceeds 128 characters.

To accommodate the data, Stitch will widen the rep_name column, thus changing the structure of the table:

| Column           | Data Type    |
|------------------+--------------|
| id [pk]          | BIGINT       |
| rep_name         | VARCHAR(256) |     // widened column
| order_amount     | BIGINT       |
| order_confirmed  | BOOLEAN      |

In PostgreSQL, no widening will occur as all VARCHAR data is stored as TEXT.

In Redshift destinations, when a VARCHAR column increases in size Stitch will automatically widen the column to accommodate the data. To do this, Stitch must drop and re-create the column in your data warehouse, which will require temporarily dropping dependent views.

Let’s say you have an orders table in an integration called rep_sales. During the structure sync, Stitch will create a schema called rep_sales in your data warehouse and inside this schema, a table called orders.

The table is structured as follows, excluding the _sdc columns:

| Column           | Data Type    |
|------------------+--------------+
| id [pk]          | BIGINT       |
| rep_name         | VARCHAR(128) |
| order_amount     | BIGINT       |
| order_confirmed  | BOOLEAN      |

During the first sync, the following rows are replicated:

| id [pk] | rep_name    | order_amount | order_confirmed |
|---------+-------------+--------------+-----------------|
| 1       | Marty McFly | 12           | true            |
| 2       | Doc Brown   | 10           | true            |

During the next sync, the following rows are replicated:

| id [pk] | rep_name                         | order_amount | order_confirmed |
|---------+----------------------------------+--------------+-----------------|
| 3       | MartyMcFlyMessedUpTheTime        | 13           | true            |
|         | ContinuumAndEliminatedHisOwn     |              |                 |
|         | ExistenceAndNowThereAren’tAny    |              |                 |
|         | HoverboardsInAnAwesomeHitSequal  |              |                 |
|         | WhichIsBothWittyAndFuturistic    |              |                 |
| 4       | Doc Brown                        | 11           | true            |

Take a look at the rep_name column for row 3 - it definitely contains data that exceeds 128 characters.

To accommodate the data, Stitch will widen the rep_name column, thus changing the structure of the table:

| Column           | Data Type    |
|------------------+--------------|
| id [pk]          | BIGINT       |
| rep_name         | VARCHAR(256) |     // widened column
| order_amount     | BIGINT       |
| order_confirmed  | BOOLEAN      |

Columns with Mixed Data Types

Stitch requires that there only be one data type per column to properly type and store your data. If a column contains multiple data types, Stitch will create additional columns and append the data type to the column name. The end result will vary slightly from destination to destination - the result for each destination can be found after the example.

Let’s take a look at an example using an orders table in an integration called rep_sales. During the structure sync, Stitch will create a schema called rep_sales in your data warehouse and inside this schema, a table called orders.

Excluding the _sdc columns, Stitch determines this is the table’s structure:

| Column           | Data Type    |
|------------------+--------------|
| id [pk]          | BIGINT       |
| order_amount     | BIGINT       |
| order_confirmed  | BOOLEAN      |

During the first sync, the following rows are replicated:

| id [pk] | order_amount | order_confirmed |
|---------+--------------+-----------------|
| 1       | 12           |                 |
| 2       | 10           | true            |

During the next sync, the following rows are replicated:

| id [pk] | order_amount | order_confirmed |
|---------+--------------+-----------------|
| 3       | 13.25        | true            |
| 4       | 11           | yes             |

Stitch will detect that the data types in these newly replicated rows differ than the ones from the initial sync. In this case:

  • order_amount was originally a BIGINT, but can sometimes be a decimal
  • order_confirmed was originally a BOOLEAN, but can sometimes be a string

To accommodate the data, Stitch will create a new column for the newly detected data type and append the data type to the column name.

How columns are named as a result of “splitting” mixed data types depends on the type of destination you’re using. In the tabs below, you’ll see what the structure of the orders table would be altered to for each destination type.

Click a tab to see how each destination handles this scenario.

In BigQuery destinations, when mixed data types are detected in the same column:

  • The original column’s name will remain as-is. It will be typed according to the first data type Stitch detected for the column - for order_amount for example, that’s BIGINT
  • Subsequent columns will have the data type appended to their name. Additionally, only data of this type will be stored in this column. For order_amount__fl, for example, that’s data determined to be of type FLOAT

In the case of the orders table, its structure would be altered to:

| Column              | Data Type  |
|---------------------+------------|
| id [pk]             | BIGINT     |
| order_amount        | BIGINT     |    // ORIGINAL order_amount column
| order_amount__fl    | FLOAT      |    // NEW order_amount column for `FLOAT` data
| order_confirmed     | BOOLEAN    |    // ORIGINAL order_confirmed column
| order_confirmed__st | TEXT       |    // NEW order_confirmed column for string data

In Panoply destinations, when mixed data types are detected in the same column:

  • The original column will be re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: only data determined to be of type BIGINT will be stored in order_amount__bigint
  • Subsequent columns will have the data type appended to their name. Additionally, only data of this type will be stored in this column. For order_amount__double, for example, that’s data determined to be of type DOUBLE

In the case of the orders table, its structure would be altered to:

| Column                   | Data Type    |
|--------------------------+--------------|
| id [pk]                  | BIGINT       |
| order_amount__bigint     | BIGINT       |    // ORIGINAL order_amount column
| order_amount__double     | DOUBLE       |    // NEW order_amount column
| order_confirmed__boolean | BOOLEAN      |    // ORIGINAL order_confirmed column
| order_confirmed__string  | VARCHAR(128) |    // NEW order_confirmed column

In PostgreSQL destinations, when mixed data types are detected in the same column:

  • The original column will be re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: only data determined to be of type BIGINT will be stored in order_amount__bigint
  • Subsequent columns will have the data type appended to their name. Additionally, only data of this type will be stored in this column. For order_amount__ddouble, for example, that’s data determined to be of type DOUBLE

In the case of the orders table, its structure would be altered to:

| Column                   | Data Type    |
|--------------------------+--------------|
| id [pk]                  | BIGINT       |
| order_amount__bigint     | BIGINT       |    // ORIGINAL order_amount column
| order_amount__double     | DOUBLE       |    // NEW order_amount column
| order_confirmed__boolean | BOOLEAN      |    // ORIGINAL order_confirmed column
| order_confirmed__string  | TEXT         |    // NEW order_confirmed column

In Redshift destinations, when mixed data types are detected in the same column:

  • The original column will be re-named to include the original (first detected) data type and only data of the original type will be stored in this column. For example: only data determined to be of type BIGINT will be stored in order_amount__bigint
  • Subsequent columns will have the data type appended to their name. Additionally, only data of this type will be stored in this column. For order_amount__double, for example, that’s data determined to be of type DOUBLE

In the case of the orders table, its structure would be altered to:

| Column                   | Data Type    |
|--------------------------+--------------|
| id [pk]                  | BIGINT       |
| order_amount__bigint     | BIGINT       |    // ORIGINAL order_amount column
| order_amount__double     | DOUBLE       |    // NEW order_amount column
| order_confirmed__boolean | BOOLEAN      |    // ORIGINAL order_confirmed column
| order_confirmed__string  | VARCHAR(128) |    // NEW order_confirmed column

Redshift/Panoply Dependent Views

When a table’s structure has changed and Stitch needs to load data into the new structure, an ALTER command will be issued. To successfully run this command, dependent views must be temporarily ‘dropped’ to allow Stitch to re-create the table. If you receive an error that looks like the following, dependent views may be the root cause:

ERROR: cannot drop table SCHEMA.TABLE column type because other objects depend on it

Hint: Use DROP ... CASCADE to drop the dependent objects too.

While an hour or two is usually sufficient to complete the process, some very large tables may require more time. This troubleshooting guide will walk you through how to locate the dependent views and temporarily drop them.


Identifying & Resolving Record Rejections

If Stitch is unable to fit data into a table, you’ll experience some missing data on your end. To troubleshoot, we recommend taking a look at some of the common causes for dropped/missing data:

  • The table name is longer than the data warehoues’s character limit
  • A column name is longer than the data warehouse’s character limit
  • Stitch’s permissions to your data warehouse have been revoked
  • There are column names appended with __string, __bigint, etc. These are reserved for Stitch.
  • There are column names prepended with _sdc or _rjm. These are reserved for Stitch.

To pinpoint the problem, check the Rejection Records table for the integration or the Data Loading Behavior rollup for your data warehouse. These rollups contain lists of common data loading scenarios and the expected behavior, including why some records may be rejected by the data warehouse.



Questions? Feedback?

Did this article help? If you have questions or feedback, please reach out to us.