Prerequisites

  • An Amazon Web Services (AWS) account. Signing up is free - click here or go to https://aws.amazon.com to create an account if you don’t have one already.

  • Some technical know-how and familiarity with AWS.

  • Optional: Create a non-default VPC. A VPC, or Virtual Private Cloud, is a scalable virtual network that resembles a traditional network used in a data center.

    All AWS accounts created after December 3, 2014, include a default VPC in each AWS region. This means that you can use the default VPC to launch your Redshift cluster instead of spinning up your own.

    Steps for creating a VPC are outside the scope of this tutorial, so the examples in this guide will use a default VPC.

    This is optional, but if you want to use your own VPC, you need to create it before continuing. Amazon’s Getting Started with VPC guide can walk you through how to do this. Note that the cluster you use for Stitch must reside in a publicly accessible subnet.


Step 1: Create cluster login credentials

To get started, sign into your AWS account and navigate to the Redshift Dashboard page. As of December 13, 2016, you can get there by clicking Services (top left corner), then clicking Redshift in the Databases section.

  1. On the Redshift Dashboard page, click the Lauch Cluster button.
  2. The Cluster Details page will display.
  3. Fill in the cluster details:
    • Cluster Identifier: This is the unique key that defines a cluster.
    • Database Name: This is the initial database for the cluster.
    • Port: A port is an opening in a firewall that allows in and outbound traffic. By default, Redshift uses port 5439.
  4. Next, you’ll create the master login credentials. You can use these credentials to connect Stitch to your Redshift cluster if you like, or if you want to be more selective with the permissions, you can create an additional user for Stitch later on in the tutorial.
    • Master User Name: This is the name of the master user for the cluster.
    • Master User Password: This is the password associated with the cluster’s master user.
  5. Re-enter the master user’s password to confirm it.
  6. Click Continue.

Step 2: Select nodes and cluster types

On the Node Configuration page, you’ll select the node type, define the cluster type, and decide how many nodes to use.

  1. Select the Node Type you want to use.
  2. Select the Cluster Type.
  3. Enter the number of Compute Nodes you want to use.
  4. Click Continue.

Step 3: Configure and launch the cluster

Next, you’ll define the additional configuration settings for the Redshift cluster. This guide will leave most of the settings as-is and focus on the Configure Networking Options and Security Groups sections.

Step 3.1: Configure networking options

In the Configure Networking Options section, define these fields:

  • Choose a VPC: From the drop-down, select the VPC you want to launch the cluster in. This will also cause the other required fields to display.
  • Cluster Subnet Group: Set this to a public subnet in your account. Note a public subnet is required for the connection to be successful.

    The default VPC included with all new AWS accounts also includes a default public subnet, which is what we’re using here.

  • Publicly Accessible: Set this to Yes. This must be set to Yes for Stitch to access your Redshift instance.

Step 3.2: Define a Security Group

The last step to configuring your cluster is to define the Security Group associated with the cluster.

In the VPC Security Groups section, select the Security Group you want to associate with the cluster.

We’re using the default Security Group that’s included with all new AWS accounts, but you can choose any one you like. You’ll edit the access rules for the Security Group in the next section, which will allow the group to access the cluster.

Here’s what the configuration looks like for our sample cluster:

Redshift Additional Configuration page.

When finished, click the Continue button.

Step 3.3: Review and launch the Redshift cluster

A review page containing all the details about your cluster (except for the master password) will display after you click Continue on the Additional Configuration page.

Additionally, the pricing info for the cluster will display at the bottom of the page.

After you’ve reviewed and confirmed the settings, click Launch Cluster to launch the cluster. Note that this process can take a few minutes - the status in the Cluster Dashboard page will change to ‘available’ when the process completes.


Step 4: Configure security and access settings

When the cluster creation process is complete, you can move onto editing the cluster security group’s access rules.

To ensure that Stitch can access your Redshift instance and load data into your data warehouse, you’ll create a Security Group Connection Rule for the security group associated with the cluster. This will whitelist the IP addresses Stitch uses and ensure we can access your Redshift.

  1. On the Redshift Dashboard page, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you just created. This will display the Cluster Details page.
  3. Locate the Cluster Properties section.
  4. Click the name of the security group in the VPC Security Groups field.
  5. This will open the Security Groups page.
  6. The Security Group associated with your instance should automatically open in the Security Groups page. If it doesn’t, click the name of the appropriate group.
  7. The Security Group’s details will display in tabs in the bottom part of the page. Click the Inbound tab:

    The Security Group page in AWS

  8. In the Inbound tab, click the Edit button. This will open the Edit inbound rules window.
  9. This is where you’ll add the IP addresses that can access the instance. Fill out the fields in this screen as follows:
    • Type: Select Custom TCP Rule from the drop-down.
    • Protocol: This will default to TCP - leave it as-is.
    • Port Range: Enter the port associated with the instance. This will be the same port you used to create the instance. (`` is the default)
    • Source: The drop-down will default to Custom - leave it as-is.

      In the field next to the drop-down, copy and paste one of the following IP addresses. You need to add them one at a time, meaning each IP address gets its own rule:

      • 52.23.137.21/32

      • 52.204.223.208/32

      • 52.204.228.32/32

      • 52.204.230.227/32

      Below is an example of what a rule should look like:

      Whitelisting Stitch IP addresses through Inbound Security Group rules.

  10. Click Add Rule to add an additional Inbound rule.
  11. Repeat the previous two steps until all of Stitch’s IP addresses (along with any others you may want to whitelist) have been added.
  12. Click Save.

Step 5: Create a Stitch user

In the following tabs are the instructions for creating a Stitch database user and explanations for the permissions Stitch requires.

  1. If you haven’t already, connect to your Redshift instance using your SQL client.

  2. After connecting, run this command to create the user:

    CREATE USER <stitch_username> WITH PASSWORD '<password>';
  3. Next, you’ll assign the CREATE permissions to the user. For <database_name>, enter the name of the database where all Stitch-replicated data should be loaded.

    Note: This must be a pre-existing database.

    If you launched a cluster using this tutorial, <database_name> is the name of the database when you launched the cluster in Step 2.

    GRANT CREATE ON DATABASE <database_name> TO <stitch_username>
  4. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions.

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO <stitch_username>
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO <stitch_username>

In the table below are the database user privileges Stitch requires to connect to and load data into Redshift.

Privilege name Reason for requirement
CREATE ON DATABASE

Required to create the necessary database objects to load and store your data.

CREATE permissions on the database are required to successfully load data. When Stitch loads data, it will run a CREATE SCHEMA IF NOT EXISTS command, which will create a schema if it doesn’t already exist. To run this command, the Stitch user must have the CREATE ON DATABASE permission.

Note: The CREATE ON SCHEMA permission is not a sufficient alternative for CREATE ON DATABASE. As outlined in Redshift’s documentation, this permission only allows a user to create objects within a schema, but not the schema itself.

Ownership of all schemas and tables that Stitch will load to

Stitch occasionally needs to run COMMENT and ALTER TABLE commands to properly insert data.

If the database user you use to connect to creates the schemas and tables, you will not need to grant explicit ownership permissions. If another user creates the schemas and tables prior to Stitch loading data into them, you will need to alter ownership to the Stitch database user. Failure to do so will result in replication issues.

Note: Stitch uses table comments to store Primary Key information. Altering or removing table comments will lead to replication issues.

SELECT ON ALL TABLES IN information_schema

Required to select rows from tables in the information_schema schema. Prior to loading data, Stitch will use the data in this schema to verify the existence and structure of integration schemas and tables.

Note: Stitch will only ever read data from systems tables.

SELECT ON ALL TABLES IN pg_catalog

Required to select rows from tables in the pg_catalog schema. Prior to loading data, Stitch will use the data in this schema to verify the existence and structure of integration schemas and tables.

Note: Stitch will only ever read data from systems tables.


Step 6: Connect Stitch

To complete the setup, you need to enter your connection details into the Destination Settings page in Stitch.

Step 6.1: Locate the connection details

  1. On the Redshift Dashboard page in the AWS Console, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you created. This will display the Cluster Details page.

  3. Locate the following fields:
    • Endpoint
    • Database Name: This is the name of the database you want to connect to Stitch. This must be a pre-existing database - Stitch won’t create one for you. It could be:
      • The name of the database you used to launch the instance.
      • The name of another database within .
    • Master Username: You’ll only need this if you’re using the master user’s credentials to connect to Stitch.
    • Port: This is the port used by the database.

These fields are highlighted in the image below:

Redshift cluster details

Step 6.2: Enter connection details into Stitch

  1. If you aren’t signed into your Stitch account, sign in now.
  2. Click the Destination tab.
  3. Click the ** icon.
  4. Fill in the fields as follows:

    • Host (Endpoint): Paste the Endpoint address from the Details page in AWS into this field.

      Don’t include the port number, which is appended to the end of the endpoint string - this will cause errors. For example: if you used the default port, you’d remove the following from the address: :

    • Port: Enter the port used by the instance. The default is ``.

    • Username: Enter the Stitch user’s username.

    • Password: Enter the password associated with the Stitch user.

    • Database: Enter the name of the database that you created for Stitch.

When finished, click Update Settings.

Stitch will perform a connection test to the database; if successful, a Success! mesage will display at the top of the screen. Note: This test may take a few minutes to complete.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.