Amazon Redshift Destination Reference

Amazon Redshift is a fully managed, cloud-based data warehouse. As Redshift is built for online analytic processing and business intelligence applications, it excels at executing large-scale analytical queries. For this reason, it exhibits far better performance than traditional, row-based relational databases like MySQL and PostgreSQL.

This guide serves as a reference for version 2 of Stitch’s Amazon Redshift destination.

Stitch features

High-level details about Stitch’s implementation of Amazon Redshift, such as supported connection methods, availability on Stitch plans, etc.

Release status	Released
Stitch plan availability	All Stitch plans
Stitch supported regions	North America (AWS us-east-1) Europe (AWS eu-central-1) Operating regions determine the location of the resources Stitch uses to process your data. Learn more.
Supported versions	Not applicable
Connect API availability	Supported This version of the Amazon Redshift destination can be created and managed using Stitch’s Connect API. Learn more.
SSH connections	Supported Stitch supports using SSH tunnels to connect to Amazon Redshift destinations.
SSL connections	Supported Stitch will attempt to use SSL to connect by default. No additional configuration is needed.
VPN connections	Unsupported Virtual Private Network (VPN) connections may be implemented as part of a Premium plan. Contact Stitch Sales for more info.
Static IP addresses	Supported This version of the Amazon Redshift destination has static IP addresses that can be whitelisted.
Default loading behavior	Upsert Note: Append-Only loading will be used if all conditions for Upsert are not met. Learn more.
Nested structure support	Unsupported Nested data structures will be flattened into relational objects. Learn more.

Destination details

Details about the destination, including object names, table and column limits, reserved keywords, etc.

Note: Exceeding the limits noted below will result in loading errors or rejected data.

Maximum record size	20MB
Table name length	127 characters
Column name length	115 characters
Maximum columns per table	1,600
Maximum table size	None
Maximum tables per database	100,000
Case sensitivity	Insensitive
Reserved keywords	Refer to the Reserved keywords documentation.

Encodings, SORT, and DIST keys

To improve your query performance, you can apply encodings, SORT, and DIST keys to Stitch-created tables in your Amazon Redshift destination. Your settings will remain intact even when new data is loaded.

Refer to the Encodings, SORT, and DIST Keys guide for application instructions.

Amazon Redshift pricing

Currently, Amazon Redshift pricing is based on an hourly rate that varies depending on the type and number of nodes in a cluster. Check out Amazon’s pricing page for an in-depth look at their current plan offerings.

So, what’s a node? A node is a single computer that participates in a cluster. Your Amazon Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type.

The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy.

For some guidance on choosing the right number of nodes for your cluster, check out Amazon’s Determining the Number of Nodes guide.

Replication

Replication process overview
Loading behavior
Primary Keys
Incompatible sources

Replication process overview

A Stitch replication job consists of three stages:

Step 1: Data extraction
Step 2: Preparation
Step 3: Loading

Step 1: Data extraction

Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.

Step 2: Preparation

During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading. Refer to the System overview guide for a more detailed explanation of the Preparation phase.

Step 3: Loading

Stitch loads the data into Amazon Redshift.

Loading behavior

By default, Stitch will use Upsert loading when loading data into Amazon Redshift.

If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.

Refer to the Understanding loading behavior guide for more info and examples.

Primary Keys

Stitch requires Primary Keys to de-dupe incrementally replicated data. To ensure Primary Key data is available, Stitch creates a primary_keys table comment. The comment is an array of strings that contain the names of the Primary Key columns for the table.

For example: A table comment for a table with a single Primary Key:

'{"primary_keys":["id"]}'

And a table comment for a table with a composite Primary Key:

'{"primary_keys":["event_id","created_at"]}'

Note: Removing or incorrectly altering Primary Key table comments can lead to replication issues.

Incompatible sources

The Amazon Redshift destination has reported incompatibilities with some of Stitch's integrations. Refer to the table below for more info.

Integration	Version	Level	Reason
MongoDB	ANY		Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more.
MongoDB Atlas	v1		Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more.

See all destination and integration incompatibilities.

Transformations

System tables and columns
Data typing
JSON structures
Column names
Timezones

System tables and columns

Stitch will create the following tables in each integration’s dataset:

_sdc_rejected

Additionally, Stitch will insert system columns (prepended with _sdc) into each table.

Data typing

Stitch converts data types only where needed to ensure the data is accepted by Amazon Redshift. In the table below are the data types Stitch supports for Amazon Redshift destinations, and the Stitch types they map to.

Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Amazon Redshift.
Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Amazon Redshift.

Stitch type	Destination type	Notes
BIGINT	BIGINT	Range : -9223372036854775808 to 9223372036854775807
BOOLEAN	BOOLEAN
DATE	TIMESTAMP	Description: Stored in UTC as `TIMESTAMP WITHOUT TIMEZONE` Range : 4713 BC to 294276 AD
DOUBLE	DOUBLE
FLOAT	DOUBLE
INTEGER	INTEGER	Range : -9223372036854775808 to 9223372036854775807
NUMBER	DECIMAL	Description: Precision must be between 1 and 38; scale must be between 0 and the precision value Range : Precision must be between 1 and 38; scale must be between 0 and the precision value
STRING	STRING	Description: Values that exceed 65535 bytes (64K -1) in size will not be rejected, but truncated to Amazon Redshift’s maximum width Range : 65535 bytes (64K -1)

JSON structures

Amazon Redshift destinations don’t have native support for nested data structures. To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.

Column names

Column names in Amazon Redshift:

Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
Must begin with a letter
Must be less than the maximum length of 115 characters. Columns that exceed this limit will be rejected by Amazon Redshift.
Must not be prefixed or suffixed with any of Amazon Redshift’s or Stitch’s reserved keyword prefixes or suffixes

Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Amazon Redshift:

Transformation	Source column	Destination column
Convert uppercase and mixed case to lowercase	`CuStOmErId` or `CUSTOMERID`	`customerid`
Remove special characters	`customer#id` or `!CuStOMerID`	`customerid` and `customerid`
Remove non-letter leading characters	`4customerid` or `_customerid`	`customerid`

Timezones

Amazon Redshift will store the value as TIMESTAMP WITHOUT TIMEZONE. In Amazon Redshift, this data is stored without timezone information and expressed as UTC.

Compare destinations

Not sure if Amazon Redshift is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Choosing a Destination Destination & Integration Compatibility Loading Data into Your Destination Switching Destinations Nested Data Structures & Row Count Impact	AWS & Redshift I/O Connection Errors Redshift View Dependency Errors Redshift VACUUM Errors Destination Connection Errors