Amazon Redshift Data Loading Reference

Learn how Stitch will load data from your integrations into Stitch’s Amazon Redshift destination.

In this guide, we’ll cover data loading scenarios involving:

The destinations this guide is applicable to
Primary key columns
Replication keys and how data is loaded as a result
Object identifiers in the destination, including naming limitations and transformations
Table creation and modification in the destination
Various data types, including how data is typed and structured in the destination
Schema changes in the source or structural changes in the destination
Modifications made to the destination, such as the application of workload/performance management features or user privilege changes

Applicable destination types

This guide is applicable to the all variations of the Amazon Redshift destination, including:

Primary Key scenarios

Scenarios involving Primary Key columns.

A table without a Primary Key is replicated.

IF	A table without a Primary Key is replicated.
THEN	Initial job: Table is created without Primary Key and no `NOT NULL` columns. Subsequent jobs: If using Key-based Incremental or Log-based Incremental Replication, data will be added to the table in an Append-Only fashion. If using Full Table Replication, the table will be overwritten in its entirety during each job.

A table with a single Primary Key is replicated.

THEN

Initial job: Table is created without Primary Key and no NOT NULL columns. Primary Key info is stored as a comment on the table.

Note: Incorrectly altering or removing a table’s Primary Key comment will result in issues with replication.
Subsequent jobs: If using Key-based Incremental or Log-based Incremental Replication, data will be de-duped based on the Primary Key column and upserted into the table.

If using Full Table Replication, the table will be overwritten in its entirety during each sync.

A table with multiple Primary Keys is replicated.

THEN

Initial job: Table is created without Primary Key and no NOT NULL columns. Primary Key info is stored as a comment on the table.

Note: Incorrectly altering or removing a table’s Primary Key comment will result in issues with replication.
Subsequent jobs: If using Key-based Incremental or Log-based Incremental Replication, data will be de-duped based on the Primary Key columns and upserted into the table.

If using Full Table Replication, the table will be overwritten in its entirety during each job.

The table's Primary Key(s) is/are changed.

The table’s Primary Key(s) is/are changed.

THEN

If using Key-based Incremental or Log-based Incremental Replication, data will be added to the table in an Append-Only fashion.

If using Full Table Replication, the table will be overwritten in its entirety during each job.

You remove the Primary Key column(s) for a table in Amazon Redshift.

IF	You remove the Primary Key column(s) for a table in Amazon Redshift.
THEN	Data will continue to be loaded into the table in an Append-Only fashion.

Replication Key scenarios

Scenarios involving Replication Keys and how data is loaded as a result.

A table using Key-based Incremental Replication is replicated where the Replication Key column contains NULL values.

IF	A table using Key-based Incremental Replication is replicated where the Replication Key column contains `NULL` values.
THEN	During the initial job, the table will be created and all rows will be replicated. During subsequent jobs, only rows with populated Replication Keys will be replicated and persisted to Amazon Redshift.

Object naming scenarios

Scenarios involving object identifiers in the destination, including naming limitations and transformations.

A table name contains more characters than allowed by Amazon Redshift.

IF	A table name contains more characters than allowed by Amazon Redshift.
THEN	Amazon Redshift will reject all data for the table.
AND	The following error will display in the Notifications tab in Stitch: `Table name [TABLE] is too long for Amazon Redshift` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	If possible, change the table name in the source to be less than Amazon Redshift’s character limit of 127 characters. Use the `_sdc_rejected` table to identify the root of the issue.

A column name contains more characters than allowed by Amazon Redshift.

IF	A column name contains more characters than allowed by Amazon Redshift.
THEN	Amazon Redshift will reject columns with names that exceed the column character limit. Other columns in the table will persist to Amazon Redshift.
AND	The following error will display in the Notifications tab in Stitch: `Column name [COLUMN] is too long for Amazon Redshift` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	If possible, change the column name in the source to be less than Amazon Redshift’s character limit of 115 characters. Use the `_sdc_rejected` table to identify the root of the issue.

Two columns are replicated that canonicalize to the same name.

IF	Two columns are replicated that canonicalize to the same name.
THEN	For example: A table containing both `CustomerId` and `customerid` columns. Amazon Redshift will reject the records and create a log for the rejected records in the `_sdc_rejected` table in that integration’s schema.
AND	The following error will display in the Notifications tab in Stitch: `Field collision on [COLUMN_NAME]` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	If possible, re-name one of the columns in the source so that both column names will be unique when replicated to Amazon Redshift. Use the `_sdc_rejected` table to identify the root of the issue.

A column is replicated that has a mixed-case name.

THEN

Amazon Redshift will convert letters to lowercase. For example:

Columns in Source	Columns in Amazon Redshift
`CuStOmErId`	`customerid`
`customerID`	`customerid`

A column is replicated that has a name with spaces.

THEN

Amazon Redshift will maintain spaces. For example:

Columns in Source	Columns in Amazon Redshift
`customer id`	`customer id`
`CUSTOMER ID`	`customer id`

A column is replicated with a name that contains unsupported special characters.

THEN

Amazon Redshift will remove all special characters with the exception of underscores and dollar signs. For example:

Columns in Source		Columns in Amazon Redshift
`customer!id`		`customerid`
`!CUSTOMERID`		`customerid`

A column is replicated with a name that begins with a non-letter.

IF	A column is replicated with a name that begins with a non-letter.
THEN	This scenario is unremarkable to Amazon Redshift.

Table scenarios

Scenarios involving table creation and modification in the destination.

A table contains entirely NULL columns.

IF	A table contains entirely `NULL` columns.
THEN	No table is created in Amazon Redshift. At least one column must have a non-`NULL` value for Stitch to create a table in Amazon Redshift.

A table arrives with more columns than Amazon Redshift allows.

IF	A table arrives with more columns than Amazon Redshift allows.
THEN	Amazon Redshift will reject all data for the table.
AND	The following error will display in the Notifications tab in Stitch: `ERROR: too many columns` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	If possible, deselect some columns to allow Stitch to load data into Amazon Redshift for the table. Amazon Redshift has a limit of 1,600 columns per table. Use the `_sdc_rejected` table to identify the root of the issue.

Data typing scenarios

Scenarios involving various data types, including how data is typed and structured in the destination.

Stitch detects multiple data types for a single column.

THEN

To accommodate data of varying types, Stitch will create multiple columns to ensure data is loaded with the correct type. In the destination, this will look like the column has been “split”.

For example: Stitch first detected that order_confirmed contained BOOLEAN data, but during a subsequent job, detected STRING values. To accommodate data of varying types, Stitch will:

Re-name the original column. The new name will be the column name and a suffix indicating the data type Stitch detected first. In this example, the new column name would be order_confirmed__bl, __bl indicating BOOLEAN.
Store data for the first-detected data type in the re-named column. In this example, all BOOLEAN data will be stored in order_confirmed__bl.
Create additional columns to store other data types, one for each data type detected. Suffixes indicating the data type will be appended to the column names. In this example, STRING data will be stored in order_confirmed__st.

Data is replicated to Amazon Redshift that is nested, containing many top-level properties and potentially nested sub-properties.

THEN

To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively.

For more info and examples, refer to the Handling nested data structures guide.

A VARCHAR column is replicated to Amazon Redshift.

A VARCHAR column is replicated to Amazon Redshift.

THEN

Amazon Redshift will type the column as VARCHAR(n), where n is the width of the widest record in the column.

For example: During the initial replication job, data in a VARCHAR column is found to have a width greater than 128. Amazon Redshift will create type this column as VARCHAR(256) to accommodate the data.

VARCHAR data is loaded that exceeds the current maximum size for the column.

VARCHAR data is loaded that exceeds the current maximum size for the column.

THEN

Amazon Redshift will widen the column to accommodate the width of the largest record up to the limit of 65535 bytes (64K -1).

For example: during the initial replication job, a VARCHAR column is created and typed as VARCHAR(128).

During a subsequent job, Stitch identifies data for this column that exceeds the current size maximum of 128 but is less than 256.

To accommodate the data, the column will become VARCHAR(256).

VARCHAR data is loaded that exceeds Amazon Redshift's supported limit.

IF	VARCHAR data is loaded that exceeds Amazon Redshift’s supported limit.
THEN	Amazon Redshift will truncate the data to the maximum width of `65535 bytes (64K -1)`.

A column containing date data with timezone info is replicated to Amazon Redshift.

THEN

Amazon Redshift will store the value as TIMESTAMP WITHOUT TIMEZONE. In Amazon Redshift, TIMESTAMP WITHOUT TIME ZONE data is stored without timezone information and expressed as UTC.

For more info on this data type, refer to Amazon’s Datetime Type documentation.

A column contains timestamp data that is outside Amazon Redshift's supported range.

IF	A column contains timestamp data that is outside Amazon Redshift’s supported range.
THEN	Amazon Redshift will reject the records that fall outside the supported range.
AND	The following error will display in the Notifications tab in Stitch: `timestamp out of range for Amazon Redshift on [TIMESTAMP]` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	To resolve the error, offending values in the source must be changed to be within Amazon Redshift’s timestamp range. Use the `_sdc_rejected` table to identify the root of the issue.

A column contains integer data that is outside Amazon Redshift's supported range.

IF	A column contains integer data that is outside Amazon Redshift’s supported range.
THEN	Amazon Redshift will reject the records that fall outside the supported range.
AND	The following error will display in the Notifications tab in Stitch: `integer out of range for Amazon Redshift on [INTEGER]` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	To resolve the error, offending values in the source must be changed to be within Amazon Redshift’s limit for integers. Use the `_sdc_rejected` table to identify the root of the issue.

A column contains decimal data that is outside Amazon Redshift's supported range.

IF	A column contains decimal data that is outside Amazon Redshift’s supported range.
THEN	Amazon Redshift will reject the records that fall outside the supported range.
AND	The following error will display in the Notifications tab in Stitch: `decimal out of range for Amazon Redshift on [DECIMAL]` Rejected records will be logged in the `_sdc_rejected` table of the integration's schema. Learn more.
FIX IT	To resolve the error, offending values in the source must be changed to be within Amazon Redshift’s limit for decimal data. Use the `_sdc_rejected` table to identify the root of the issue.

Schema change scenarios

Scenarios involving schema changes in the source or structural changes in the destination.

A new column is added in table already set to replicate.

THEN

If the column has at least one non-NULL value in the source, the column will be created and appended to the end of the table in Amazon Redshift.

Note: If the table using either Key- or Log-based Incremental Replication, backfilled values for the column will only be replicated if:

The records’ Replication Key values are greater than or equal to the last saved maximum Replication Key value for the table, or
The table is reset and a historical re-replication is queued.

A new column is added by you to a Stitch-generated table in Amazon Redshift.

IF	A new column is added by you to a Stitch-generated table in Amazon Redshift.
THEN	Columns may be added to tables created by Stitch as long as they are nullable, meaning columns don’t have `NOT NULL` constraints.

A column is deleted at the source.

THEN

How a deleted column is reflected in Amazon Redshift depends on the Replication Method used by the table:

Key-based Incremental: The column will remain in the destination, and default NULL values will be placed in it going forward.
Log-based Incremental: Changes to a source table - including adding or removing columns, changing data types, etc. - require manual intervention before replication can continue. Refer to the Log-based Incremental Replication documentation for more info.
Full Table: The column will remain in the destination, and default NULL values will be placed in it going forward.

You remove a column from a Stitch-replicated table in your destination.

THEN

The result of deleting a column from a Stitch-generated table depends on the type of column being removed:

Primary Key columns: Changing a table’s Primary Key(s) is not permitted in Amazon Redshift. If Primary Key columns are changed, Stitch will stop processing data for the table.
General columns: If new data is detected for the removed column, Stitch will re-create it in Amazon Redshift. This refers to all columns that are not prepended by _sdc or suffixed by a data type. For example: customer_zip, but not customer_zip__st.

Note: An integration must support selecting columns AND you must deselect the column in Stitch for the column removal to be permanent.
_sdc columns: Removing a Stitch replication column will prevent Stitch from loading replicated data into Amazon Redshift.
Columns with data type suffixes: Removing a column created as result of accommodating multiple data types will prevent Stitch from loading replicated data into the table. This applies to columns with names such as: customer_zip__st, customer_zip__int, etc.

Destination changes

Scenarios involving modifications made to the destination, such as the application of workload/performance management features or user privilege changes.

Indices are applied to Stitch-generated columns in the destination.

IF	Indices are applied to Stitch-generated columns in the destination.
THEN	Stitch will respect the index application.

SORT and/or DISTRIBUTION Keys are applied to Stitch-generated columns in the destination.

IF	SORT and/or DISTRIBUTION Keys are applied to Stitch-generated columns in the destination.
THEN	Stitch will respect the application when loading data.

You switch to a different destination of the same type.

THEN

This means the destination type is still Amazon Redshift, Stitch may just be connected a different database in Amazon Redshift.

For tables using Key-based or Log-based Incremental Replication, replication will continue using the Replication’s Key last saved maximum value. To re-replicate historical data, resetting Replication Keys is required.
For tables using Full Table Replication, the table will be fully replicated into the new destination during the next successful job.
For webhook integrations, some data loss may be possible due to the continuous, real-time nature of webhooks. Historical data must either be backfilled or re-played.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Syncing Data Syncing New Columns Replication Methods Replication Keys Resetting Replication Keys	Data Discrepancy Guide Missing Objects Missing Columns & NULL Values Non-Replicating Data & Unsupported Data Types Destination & Integration Compatibility