Setup Requirements

To set up in Stitch, you need:

  • An Amazon Web Services (AWS) account. Signing up is free - click here or go to https://aws.amazon.com to create an account if you don’t have one already.

  • Some technical know-how and familiarity with AWS.

  • Optional: Create a non-default VPC. A VPC, or Virtual Private Cloud, is a scalable virtual network that resembles a traditional network used in a data center.

    All AWS accounts created after December 3, 2014, include a default VPC in each AWS region. This means that you can use the default VPC to launch your Redshift cluster instead of spinning up your own.

    Steps for creating a VPC are outside the scope of this tutorial, so the examples in this guide will use a default VPC.

    This is optional, but if you want to use your own VPC, you need to create it before continuing. Amazon’s Getting Started with VPC guide can walk you through how to do this. Note that the cluster you use for Stitch must reside in a publicly accessible subnet.


Step 1: Create Cluster Login Credentials

To get started, sign into your AWS account and navigate to the Redshift Dashboard page. As of December 13, 2016, you can get there by clicking Services (top left corner), then clicking Redshift in the Databases section.

  1. On the Redshift Dashboard page, click the Lauch Cluster button.
  2. The Cluster Details page will display.
  3. Fill in the cluster details:
    • Cluster Identifier: This is the unique key that defines a cluster.
    • Database Name: This is the initial database for the cluster.
    • Port: A port is an opening in a firewall that allows in and outbound traffic. By default, Redshift uses port 5439.
  4. Next, you’ll create the master login credentials. You can use these credentials to connect Stitch to your Redshift cluster if you like, or if you want to be more selective with the permissions, you can create an additional user for Stitch later on in the tutorial.
    • Master User Name: This is the name of the master user for the cluster.
    • Master User Password: This is the password associated with the cluster’s master user.
  5. Re-enter the master user’s password to confirm it.
  6. Click Continue.

Step 2: Select Nodes & Cluster Types

On the Node Configuration page, you’ll select the node type, define the cluster type, and decide how many nodes to use.

A node is a single computer that participates in a cluster. A Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type. In addition, each node also has its own pricing plan.

The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy. For some guidance on choosing the number of nodes, click here to read the ‘Determining the Number of Nodes’ section of the Amazon Redshift guide.
  1. Select the Node Type you want to use.
  2. Select the Cluster Type.
  3. Enter the number of Compute Nodes you want to use.
  4. Click Continue.

Step 3: Configure & Launch the Redshift Cluster

Next, you’ll define the additional configuration settings for the Redshift cluster. For our purposes, we’re going to leave most of the settings as-is and focus on the Configure Networking Options and Security Groups sections.

Step 3.1: Configure Networking Options

In the Configure Networking Options section, define these fields:

  • Choose a VPC: From the drop-down, select the VPC you want to launch the cluster in. This will also cause the other required fields to display.

  • Cluster Subnet Group: Set this to a public subnet in your account. Note a public subnet is required for the connection to be successful.

    The default VPC included with all new AWS accounts also includes a default public subnet, which is what we’re using here.

  • Publicly Accessible: Set this to Yes. This must be set to Yes for Stitch to access your Redshift instance.

Step 3.2: Define a Security Group

The last step to configuring your cluster is to define the Security Group associated with the cluster.

In the VPC Security Groups section, select the Security Group you want to associate with the cluster.

We’re using the default Security Group that’s included with all new AWS accounts, but you can choose any one you like. You’ll edit the access rules for the Security Group in the next section, which will allow the group to access the cluster.

Here’s what the configuration looks like for our sample cluster:

Redshift Additional Configuration page.

When finished, click the Continue button.

Step 3.3: Review & Launch the Redshift Cluster

A review page containing all the details about your cluster (except for the master password) will display after you click Continue on the Additional Configuration page.

Additionally, the pricing info for the cluster will display at the bottom of the page.

After you’ve reviewed and confirmed the settings, click Launch Cluster to launch the cluster. Note that this process can take a few minutes - the status in the Cluster Dashboard page will change to ‘available’ when the process completes.


Step 4: Configure Security & Access Settings

When the cluster creation process is complete, you can move onto editing the cluster security group’s access rules.

To ensure that Stitch can access your Redshift instance and load data into your data warehouse, you’ll create a Security Group Connection Rule for the security group associated with the cluster. This will whitelist the IP addresses Stitch uses and ensure we can access your Redshift.

  1. On the Redshift Dashboard page, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you just created. This will display the Cluster Details page.
  3. Locate the Cluster Properties section.
  4. Click the name of the security group in the VPC Security Groups field.
  5. This will open the Security Groups page.
  6. The Security Group associated with your Redshift instance should automatically open in the Security Groups page. If it doesn’t, click the name of the appropriate group.
  7. The Security Group’s details will display in tabs in the bottom part of the page. Click the Inbound tab:

    The Security Group page in AWS

  8. In the Inbound tab, click the Edit button. This will open the Edit inbound rules window.
  9. This is where you’ll add the IP addresses that can access the instance. Fill out the fields in this screen as follows:
    • Type: Select Custom TCP Rule from the drop-down.
    • Protocol: This will default to TCP - leave it as-is.
    • Port Range: Enter the port associated with the instance. This will be the same port you used to create the instance. (5439 is the default)
    • Source: The drop-down will default to Custom - leave it as-is.

      In the field next to the drop-down, copy and paste one of the following IP addresses. You need to add them one at a time, meaning each IP address gets its own rule:

      • 52.23.137.21/32

      • 52.204.223.208/32

      • 52.204.228.32/32

      • 52.204.230.227/32

      Below is an example of what a rule should look like:

      Whitelisting Stitch IP addresses through Inbound Security Group rules.

  10. Click Add Rule to add an additional Inbound rule.
  11. Repeat the previous two steps until all of Stitch’s IP addresses (along with any others you may want to whitelist) have been added.
  12. Click Save.

Step 5: Create a Stitch User

As we mentioned when you created the instance, the credentials used to launch belong to the master user, or admin.

If you want to control the permission settings, you’ll need to create an additional database user for Stitch. An added benefit is that Stitch will be easily distinguishable in any logs or audits.

Requirements for creating a Redshift user

You must be a superuser to create a Redshift user. If you’re not a superuser or you’re not sure what this means, please contact someone on your technical team before proceeding.

Required Redshift permissions for Stitch

The Stitch database user requires the following to successfully load data into Redshift:

Action Reason
CREATE schemas, tables, and views Stitch requires these permissions to create the necessary database objects to load and store your data.
SELECT from tables in the pg_catalog and information_schema schemas Stitch uses the data in the systems tables to verify the existence of integration schemas and tables, and gain information about their structure prior to loading data.

Note: Stitch will only ever read data from the systems tables.
Ownership of all integration schemas and tables Stitch occasionally needs to run COMMENT and ALTER TABLE commands to properly insert data.

While you technically can manually create integration schemas in Redshift and then grant Stitch access to those schemas, we don’t recommend it. We understand that security is a high priority and permissions should be carefully granted, but this approach will likely lead to headaches for you.

If you take this approach, you’ll need to first create the schema in your data warehouse and then grant the Stitch user the appropriate permissions to access it every time a new integration is added in Stitch. If this isn’t done, Stitch won’t be able to load data into your data warehouse.

Additionally, if there are any differences between the name of the schema you create in your data warehouse and the schema name you enter in Stitch when setting up an integration, Stitch won’t be able to load data into your data warehouse.

Create the Database User

Depending on the type of connection this is, the steps for creating a Redshift database user for Stitch will vary a bit:

  • If Stitch hasn’t ever replicated data to your Redshift destination, follow the instructions in the NEW Connections tab.
  • If Stitch has replicated data to your Redshift destination in the past, follow the instructions in the EXISTING Connections tab.

Note: Anything inside brackets - <like this> - is something you need to define when running the commands yourself.

  1. If you haven’t already, connect to your instance using your SQL client.

  2. After connecting, run this command to create the user:

    CREATE USER <stitch_username> WITH PASSWORD '<password>';
    
  3. Next, you’ll assign the CREATE permissions to the user. For <database_name>, enter the name of the database where all Stitch-replicated data should be loaded. If you launched a cluster using this tutorial, it’s the name of the database when you launched the cluster in Step 2.

    Note: This must be a pre-existing database.

    GRANT CREATE ON DATABASE <database_name> TO <stitch_username>;
    
  4. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions:

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO <stitch_username>
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO <stitch_username>
    

Note: Anything inside brackets - <like this> - is something you need to define when running the commands yourself.

  1. If you haven’t already, connect to your instance using your SQL client.

  2. After connecting, run this command to create the user:

    CREATE USER <stitch_username> WITH PASSWORD '<password>';
    
  3. Next, you’ll assign the CREATE permissions to the user. For <database_name>, enter the name of the database where all Stitch-replicated data should be loaded. If you launched a cluster using this tutorial, it’s the name of the database when you launched the cluster in Step 2.

    Note: This must be a pre-existing database.

    GRANT CREATE ON DATABASE <database_name> TO <stitch_username>;
    
  4. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions:

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO <stitch_username>
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO <stitch_username>
    

Transfer integration schema and table ownership

When an integration is initially connected to Stitch, a schema specific to that connection is created in your data warehouse. Ownership to those schemas and all the tables contained within them must be granted to the Stitch user for data replication to be successful.

  1. Query the catalog tables using this statement to list all tables and their owners:

    SELECT * 
    FROM pg_catalog.pg_tables;
    

    Take note of any integration tables that aren’t owned by the Stitch user - those are the tables you’ll need to update.

  2. Transfer ownership of the integration schema to the Stitch user:

    ALTER SCHEMA <schema_name> OWNER TO <stitch_username>;
    
  3. Next, you’ll transfer ownership of all tables in the integration schema to the Stitch user. There are two ways to accomplish this: table-by-table or running a script.

    Table-by-Table

    For every table in the schema not owned by the Stitch user - as outlined in step 1 - use this command to transfer ownership:

    ALTER TABLE <table_name> OWNER TO <stitch_username>;
    
    Script

    To transfer ownership automatically, run this script from the command line:

    for table in `psql -qAt --host [your.redshift.amazonaws.com] --port [port] --user [admin_user] -c "select tablename from pg_tables where schemaname = '[schema_name]';"  ` ; do psql -qAt --host [your.redshift.amazonaws.com] --port [port] --user [admin_user] -c "alter table \"[schema_name]\".\"$table\" owner to [stitch_username];" [database_name] ; done
    

Note: If you need to transfer ownership of more than one schema and its tables, you’ll need to repeat steps 2 and 3 for every integration schema.


Step 6: Connect Stitch

To complete the setup, you need to enter your Redshift connection details into the Destination Settings page in Stitch.

Step 6.1: Locate the Redshift Connection Details

  1. On the Redshift Dashboard page in the AWS Console, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you created. This will display the Cluster Details page.

  3. Locate the following fields:
    • Endpoint
    • Database Name: This is the name of the database you want to connect to Stitch. This must be a pre-existing database - Stitch won’t create one for you. It could be:
      • The name of the database you used to launch the instance.
      • The name of another database within Redshift.
    • Master Username: You’ll only need this if you’re using the master user’s credentials to connect to Stitch.
    • Port: This is the port used by the database.

These fields are highlighted in the image below:

Redshift cluster details

Step 6.2: Enter connection details into Stitch

  1. If you aren’t signed into your Stitch account, sign in now.
  2. Click the Destination tab.
  3. Click the Redshift icon.
  4. Fill in the fields as follows:

    • Host (Endpoint): Paste the Endpoint address from the Redshift Details page in AWS into this field.

      Don’t include the port number, which is appended to the end of the endpoint string - this will cause errors. For example: if you used the default port, you’d remove the following from the address: :5439

    • Port: Enter the port used by the Redshift instance. The default is 5439.

    • Username: Enter the Stitch Redshift user’s username.

    • Password: Enter the password associated with the Stitch Redshift user.

    • Database: Enter the name of the Redshift database that you created for Stitch.

When finished, click Update Redshift Settings.

Stitch will perform a connection test to the Redshift database; if successful, a Success! mesage will display at the top of the screen. Note: This test may take a few minutes to complete.


Troubleshooting connection errors

If you receive an error message, we recommend trying these troubleshooting steps for common destination connection problems before reaching out to support.

We’ve found that the majority of the time, these resources can resolve most issues.


Learning about Stitch and Redshift

Now that your Redshift data warehouse is up and running, we recommend learning about how Stitch will load your data into it.

  • Redshift Destination Overview: This overview will give you the high-level basics for working with your Redshift data warehouse.

  • Destination Data Loading Guide: This article covers many of the common scenarios Stitch will encounter when loading data into your Redshift destination and the expected behavior for each scenario.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.