Amazon Redshift is a fully managed, cloud-based data warehouse. As Redshift is built for online analytic processing and business intelligence applications, it excels at executing large-scale analytical queries. For this reason, it exhibits far better performance than traditional, row-based relational databases like MySQL and PostgreSQL.

This guide serves as a reference for version 2 of Stitch’s Amazon Redshift destination.


Details and features

Stitch features

High-level details about Stitch’s implementation of Amazon Redshift, such as supported connection methods, availability on Stitch plans, etc.

Release status

Released

Stitch plan availability

All Stitch plans

Stitch supported regions
  • North America (AWS us-east-1)
  • Europe (AWS eu-central-1)

Operating regions determine the location of the resources Stitch uses to process your data. Learn more.

Supported versions

Not applicable

Connect API availability Supported

This version of the Amazon Redshift destination can be created and managed using Stitch’s Connect API. Learn more.

SSH connections Supported

Stitch supports using SSH tunnels to connect to Amazon Redshift destinations.

SSL connections Supported

Stitch will attempt to use SSL to connect by default. No additional configuration is needed.

VPN connections Unsupported

Virtual Private Network (VPN) connections may be implemented as part of a Premium plan. Contact Stitch Sales for more info.

Static IP addresses Supported

This version of the Amazon Redshift destination has static IP addresses that can be whitelisted.

Default loading behavior

Upsert
Note: Append-Only loading will be used if all conditions for Upsert are not met. Learn more.

Nested structure support

Unsupported
Nested data structures will be flattened into relational objects. Learn more.

Destination details

Details about the destination, including object names, table and column limits, reserved keywords, etc.

Note: Exceeding the limits noted below will result in loading errors or rejected data.

Maximum record size

20MB

Table name length

127 characters

Column name length

115 characters

Maximum columns per table

1,600

Maximum table size

None

Maximum tables per database

100,000

Case sensitivity

Insensitive

Reserved keywords

Refer to the Reserved keywords documentation.

Encodings, SORT, and DIST keys

To improve your query performance, you can apply encodings, SORT, and DIST keys to Stitch-created tables in your Amazon Redshift destination. Your settings will remain intact even when new data is loaded.

Refer to the Encodings, SORT, and DIST Keys guide for application instructions.

Amazon Redshift pricing

Currently, Amazon Redshift pricing is based on an hourly rate that varies depending on the type and number of nodes in a cluster. Check out Amazon’s pricing page for an in-depth look at their current plan offerings.

So, what’s a node? A node is a single computer that participates in a cluster. Your Amazon Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type.

The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy.

For some guidance on choosing the right number of nodes for your cluster, check out Amazon’s Determining the Number of Nodes guide.


Replication

Replication process overview

A Stitch replication job consists of three stages:

Step 1: Data extraction

Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.

Step 2: Preparation

During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading. Refer to the System overview guide for a more detailed explanation of the Preparation phase.

Step 3: Loading

Stitch loads the data into Amazon Redshift.

Loading behavior

By default, Stitch will use Upsert loading when loading data into Amazon Redshift.

If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.

Refer to the Understanding loading behavior guide for more info and examples.

Primary Keys

Stitch requires Primary Keys to de-dupe incrementally replicated data. To ensure Primary Key data is available, Stitch creates a primary_keys table comment. The comment is an array of strings that contain the names of the Primary Key columns for the table.

For example: A table comment for a table with a single Primary Key:

'{"primary_keys":["id"]}'

And a table comment for a table with a composite Primary Key:

'{"primary_keys":["event_id","created_at"]}'

Note: Removing or incorrectly altering Primary Key table comments can lead to replication issues.

Incompatible sources

The Amazon Redshift destination has reported incompatibilities with some of Stitch's integrations. Refer to the table below for more info.

Integration Version Level Reason
MongoDB ANY

Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more.

MongoDB Atlas v1

Flattening nested JSON structures may result in tables with columns that exceed Amazon Redshift’s 1,600 column limit. Learn more.

See all destination and integration incompatibilities.


Transformations

System tables and columns

Stitch will create the following tables in each integration’s dataset:

Additionally, Stitch will insert system columns (prepended with _sdc) into each table.

Data typing

Stitch converts data types only where needed to ensure the data is accepted by Amazon Redshift. In the table below are the data types Stitch supports for Amazon Redshift destinations, and the Stitch types they map to.

  • Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
  • Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Amazon Redshift.
  • Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Amazon Redshift.
Stitch type Destination type Notes
BIGINT BIGINT
  • Range : -9223372036854775808 to 9223372036854775807

BOOLEAN BOOLEAN
DATE TIMESTAMP
  • Description: Stored in UTC as TIMESTAMP WITHOUT TIMEZONE

  • Range : 4713 BC to 294276 AD

DOUBLE DOUBLE
FLOAT DOUBLE
INTEGER INTEGER
  • Range : -9223372036854775808 to 9223372036854775807

NUMBER DECIMAL
  • Description: Precision must be between 1 and 38; scale must be between 0 and the precision value

  • Range : Precision must be between 1 and 38; scale must be between 0 and the precision value

STRING STRING
  • Description: Values that exceed 65535 bytes (64K -1) in size will not be rejected, but truncated to Amazon Redshift’s maximum width

  • Range : 65535 bytes (64K -1)

JSON structures

Amazon Redshift destinations don’t have native support for nested data structures. To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.

Column names

Column names in Amazon Redshift:

Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Amazon Redshift:

Transformation Source column Destination column
Convert uppercase and mixed case to lowercase CuStOmErId or CUSTOMERID customerid
Remove special characters customer#id or !CuStOMerID customerid and customerid
Remove non-letter leading characters 4customerid or _customerid customerid

Timezones

Amazon Redshift will store the value as TIMESTAMP WITHOUT TIMEZONE. In Amazon Redshift, this data is stored without timezone information and expressed as UTC.


Compare destinations

Not sure if Amazon Redshift is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.