In this tutorial, we covered how to spin up a Redshift cluster with a public IP address to use with Stitch. If your Redshift cluster is in a private subnet, however, you can use an SSH tunnel to connect Stitch to your data warehouse.

The method you’ll use to accomplish this involves launching an EC2 instance into the VPC associated with the cluster. This EC2 instance will have a public IP address and act as a bastion, allowing you to forward traffic through an encrypted tunnel to your private Redshift cluster.

Redshift Setup Requirements

To set up Redshift in Stitch, you need:

  • An Amazon Web Services (AWS) account. Signing up is free - click here or go to https://aws.amazon.com to create an account if you don’t have one already.

  • Some technical know-how and familiarity with AWS.

  • Optional: Create a non-default VPC. A VPC, or Virtual Private Cloud, is a scalable virtual network that resembles a traditional network used in a data center.

    All AWS accounts created after December 3, 2014, include a default VPC in each AWS region. This means that you can use the default VPC to launch your Redshift cluster instead of spinning up your own.

    Steps for creating a VPC are outside the scope of this tutorial, so the examples in this guide will use a default VPC.

    This is optional, but if you want to use your own VPC, you need to create it before continuing. Amazon’s Getting Started with VPC guide can walk you through how to do this.

Additionally, note the following before you get started:

  • This tutorial assumes you’ll be using a Linux-based server to launch your instance and create the SSH tunnel.

  • An SSH tunnel isn’t necessarily more secure than a direct connection. An SSH tunnel is only as secure as the monitoring and hardening you perform on the SSH server hosting the tunnel.

If you have questions or concerns about Stitch security, please refer to the Security FAQ.


Step 1: Create cluster login credentials

To get started, sign into your AWS account and navigate to the Redshift Dashboard page. As of December 13, 2016, you can get there by clicking Services (top left corner), then clicking Redshift in the Databases section.

  1. On the Redshift Dashboard page, click the Lauch Cluster button.
  2. The Cluster Details page will display.
  3. Fill in the cluster details:
    • Cluster Identifier: This is the unique key that defines a cluster.
    • Database Name: This is the initial database for the cluster.
    • Port: A port is an opening in a firewall that allows in and outbound traffic. By default, Redshift uses port 5439.
  4. Next, you’ll create the master login credentials. You can use these credentials to connect Stitch to your Redshift cluster if you like, or if you want to be more selective with the permissions, you can create an additional user for Stitch later on in the tutorial.
    • Master User Name: This is the name of the master user for the cluster.
    • Master User Password: This is the password associated with the cluster’s master user.
  5. Re-enter the master user’s password to confirm it.
  6. Click Continue.

Step 2: Select nodes and cluster types

On the Node Configuration page, you’ll select the node type, define the cluster type, and decide how many nodes to use.

A node is a single computer that participates in a cluster. A Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type. In addition, each node also has its own pricing plan.

The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy. For some guidance on choosing the number of nodes, click here to read the ‘Determining the Number of Nodes’ section of the Amazon Redshift guide.
  1. Select the Node Type you want to use.
  2. Select the Cluster Type.
  3. Enter the number of Compute Nodes you want to use.
  4. Click Continue.

Step 3: Configure and launch the Redshift cluster

Next, you’ll define the additional configuration settings for the Redshift cluster. For our purposes, we’re going to leave most of the settings as-is and focus on the Configure Networking Options and Security Groups sections.

Step 3.1: Configure networking options

In the Configure Networking Options section, define these fields:

  • Choose a VPC: From the drop-down, select the VPC you want to launch the cluster in. This will also cause the other required fields to display.

  • Cluster Subnet Group: Select a Subnet Group to associate with the cluster. This can be public or private.

    The default VPC included with all new AWS accounts also includes a default public subnet, which is what we’re using here.

  • Publicly Accessible: Set this to No.

Step 3.2: Define a Security Group

The last step to configuring your cluster is to define the Security Group associated with the cluster.

In the VPC Security Groups section, select the Security Group you want to associate with the cluster.

We’re using the default Security Group that’s included with all new AWS accounts, but you can choose any one you like. You’ll edit the access rules for the Security Group in the next section, which will allow the group to access the cluster.

Here’s what the configuration looks like for our sample cluster:

Redshift Additional Configuration page.

When finished, click the Continue button.

Step 3.3: Review and launch the Redshift cluster

A review page containing all the details about your cluster (except for the master password) will display after you click Continue on the Additional Configuration page.

Additionally, the pricing info for the cluster will display at the bottom of the page.

After you’ve reviewed and confirmed the settings, click Launch Cluster to launch the cluster. Note that this process can take a few minutes - the status in the Cluster Dashboard page will change to ‘available’ when the process completes.


Step 4: Create a bastion in your VPC

Next, you’ll launch an EC2 instance to serve as the SSH bastion. This publicly accessible instance will act as an intermediary, forwarding the traffic from Stitch through an encrypted tunnel to your private Redshift cluster.

Note: This instance must reside in the same VPC as the Redshift cluster.

Step 4.1: Configure the EC2 instance

The first part of creating a bastion in your VPC is configuring the instance.

  1. Navigate to the VPC Management Console in AWS. If you use the Services menu (top left corner), click the VPC option under the Networking & Content Delivery section.
  2. On the VPC Dashboard, click the Launch EC2 Instances button.
  3. Next, you’ll be asked to select the Amazon Machine Image, or AMI, that will be used to launch the instance.

    We’ll be using a Linux-based AMI (like Ubuntu) for this tutorial:

    Ubuntu Amazon Machine Image option in AWS

    Click the Select button next to the AMI you want to use.

  4. On the next page, you’ll select the instance type. Generally, a small instance will work just fine as a bastion. For example: t2.medium. You can find more info about instance types on Amazon’s website.

    After you select the instance type, click the Configure Instance Details button in the lower right corner of the page to continue.

  5. On the Configure Instance Details page, fill in the following fields:
    • Network - Select the same VPC that you launched your Redshift cluster in. This was what you selected for the Choose a VPC field in Step 3.
    • Subnet - Select the subnet you want to associate with the EC2 instance. We recommend using a public subnet - that is, a subnet with an Internet Gateway - as this will automatically assign a public IP address to machines in the subnet.

      If you’re not sure what subnet to use, simply leave this as the default.

    • Auto-assign public IP - Select Enable from the drop-down. This should be enabled to ensure your machine has a public IP address.

    Here’s a look at our setup:

    Configuring the EC2 Instance Details

  6. Click the Next: Add Storage button in the lower right corner of the page to continue.
  7. If you’re only using the machine as a bastion (which is what we’re doing in this tutorial), adding storage and tags are unnecessary. Skip over these pages until you reach the Configure Security Group page.

Step 4.2: Configure the EC2 instance's Security Group

The second part of creating a bastion in your VPC is configuring the security group. During this step, you’ll add the IP addresses that are allowed to access the bastion to the security group.

You can create a new Security Group or update an existing one. For this tutorial, we’ll create a new group.

  1. Select the Create a new security group option.
  2. In the Security group name field, enter Stitch Bastion or a unique name for the Security Group.
  3. In the table below the Description field, you’ll add Stitch’s IP addresses to the security group:
    • Type: If the default SSH port for your server is 22, set this to SSH.

      If it’s something else, set this to Custom TCP Rule.

    • Protocol: This will default to TCP - leave it as-is.
    • Port Range: This is the number of the SSH port associated with the bastion. If you selected SSH as the Type, this will default to 22.

      If you selected Custom TCP Rule, enter the number of the SSH port in this field.

    • Source: This should default to Custom. In the field next to the Source drop-down menu, paste one of the following IP addresses:

      • 52.23.137.21/32

      • 52.204.223.208/32

      • 52.204.228.32/32

      • 52.204.230.227/32

  4. Click the Add Rule button to add another rule.
  5. Repeat steps 3 and 4 until all of Stitch’s IP addresses have been added to the Security Group.

    Here’s what the Security Group rules should look like:

    Configuring the EC2 Instance Security Group

  6. When finished, click the Review and Launch button in the lower right corner of the page.

Step 4.3: Review and launch the EC2 instance

The last step is to review the settings for the EC2 instance and launch it.

Review the instance’s settings, paying particular attention to the fields highlighted in the image below:

Reviewing the EC2 Instance Details

  • The Security Groups section should list either a new Security Group for Stitch OR an existing group that contains group rules for Stitch’s IP addresses. If it doesn’t, refer to the Configure the EC2 Instance’s Security Group for instructions.
  • In the Instance Details section:
    • Network: This field should contain the ID of the same VPC that you launched the Redshift cluster in Step 3 of this tutorial.
    • Assign Public IP & Assign IPv6 IP: We strongly recommend using a public subnet with the instance and auto-assigning a public IP address. This will ensure that Stitch can access the instance.

After you’re reviewed the instance’s settings, click the Launch button in the lower right corner to launch the instance.

Note that it may take a few minutes for the instance creation process to complete. The status in the VPC Dashboard page will change to ‘available’ when the instance is ready.


Step 5: Enable the bastion to access the Redshift cluster

After the EC2 instance has finished initializing, you can move onto configuring the access rules for Redshift cluster. In this section, you’ll create a VPC Security Group that will forward traffic from the bastion (EC2 instance) to your private Redshift cluster.

Step 5.1: Retrieve the VPC's IPv4 CIDR

In this step, you’ll retrieve the bastion’s IP address, or IPv4 CIDR. This value will be followed by a slash and a number between 0 and 32. For example: 10.0.0.0/16

  1. Navigate to the VPC Management Console in AWS. If you use the Services menu (top left corner), click the VPC option under the Networking & Content Delivery section.
  2. On the VPC Dashboard, click the Your VPCs option under Virtual Private Cloud in the menu on the left side of the page.
  3. A list of all the VPCs you have access to in your AWS account will display. Locate the one you launched your Redshift cluster and the bastion in.
  4. Locate the IPv4 CIDR column.

    If this column isn’t in the table, simply click on the VPC to open its details in the bottom section of the page:

    VPC details & IPv4 CIDR

  5. Copy and paste the VPC’s IPv4 CIDR value somewhere convenient - you’ll need it in the next step.

Step 5.2: Create a VPC Security Group

Now that you’ve retrieved the bastion’s IP address, you can create a security group that will allow traffic from the bastion to access the Redshift cluster.

  1. From the VPC page, click the Security Groups option under Security in the menu on the left side of the page.
  2. Click the Create Security Group button.
  3. In the Create Security Group window:
    • Name tag: Enter a name tag if you want; otherwise, leave blank.
    • Group name: Enter Stitch, or a unique name for the Security Group.
    • Description: Enter a brief description of what the group is.
    • VPC: Verify that the VPC where you launched the Redshift cluster and bastion is selected in the drop-down.
  4. Click Yes, Create.
  5. After the Security Group is created, locate it in the list of Security Groups.
  6. Click on the Security Group.
  7. In the bottom section of the page - where the Security Group’s details are displayed - click the Inbound Rules tab.
  8. Click the Edit button to create an Inbound rule for the Security Group:
    • Type: Select Custom TCP Rule.
    • Protocol: This should default to TCP - leave it as-is.
    • Port Range: Enter the port used by the Redshift instance. Our instance is using the Redshift default (5439).
    • Source: Enter the bastion’s VPC IPv4 CIDR. Ex: 10.0.0.0/16

    Here’s what the Inbound rule should look like:

    VPC inbound Security Group rule

  9. When finished, click Save to create the rule.

Step 6: Create a Stitch Linux user

Next, you’ll retrieve your Public Key and create a Linux user on your server for Stitch.

Step 6.1: Retrieve your Public Key

The Stitch Public Key

The Public Key is used to authorize the Stitch Linux user. If the key isn’t properly installed, Stitch will be unable to access your database.

To retrieve the key:

  1. Sign into your Stitch account.

  2. Click the Destination tab.
  3. Click the Redshift icon.

  4. When the credentials page displays, click the Encryption Type menu and select the SSH Tunnel option.

  5. The Public Key will display, along with the other SSH fields.

Leave this page open for now - you’ll need it to wrap things up at the end.

Step 6.2: Create the Stitch Linux user

  1. Run the following commands as root on your Linux server to create a user named stitch:

    adduser --disabled-password stitch
    mkdir /home/stitch/.ssh
    
  2. Next, import the Public Key into authorized_keys, replacing [PASTE KEY HERE] with the Stitch Public Key:

    echo "[PASTE KEY HERE]" >> /home/[stitch_username]/.ssh/authorized_keys
    
  3. Alter the permissions on the /home/stitch directory to allow access via SSH:

    chown -R [stitch_username]:stitch /home/stitch
    chmod -R 700 /home/stitch/.ssh
    

Step 7: Create a Stitch Redshift User

  1. If you haven’t already, connect to your Redshift instance using your SQL client.

  2. After connecting, run this command to create the user:

    CREATE USER <stitch_username> WITH PASSWORD '<password>';
  3. Next, you’ll assign the CREATE permissions to the user. For <database_name>, enter the name of the database where all Stitch-replicated data should be loaded.

    Note: This must be a pre-existing database.

    If you launched a cluster using this tutorial, <database_name> is the name of the database when you launched the cluster in Step 2.

    GRANT CREATE ON DATABASE <database_name> TO <stitch_username>
  4. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions.

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO <stitch_username>
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO <stitch_username>

In the table below are the database user privileges Stitch requires to connect to and load data into Redshift.

Privilege name Reason for requirement
CREATE ON DATABASE

Required to create the necessary database objects to load and store your data.

CREATE permissions on the database are required to successfully load data. When Stitch loads data, it will run a CREATE SCHEMA IF NOT EXISTS command, which will create a schema if it doesn’t already exist. To run this command, the Stitch user must have the CREATE ON DATABASE permission.

Note: The CREATE ON SCHEMA permission is not a sufficient alternative for CREATE ON DATABASE. As outlined in Redshift’s documentation, this permission only allows a user to create objects within a schema, but not the schema itself.

Ownership of all schemas and tables that Stitch will load to

Stitch occasionally needs to run COMMENT and ALTER TABLE commands to properly insert data.

If the database user you use to connect to Redshift creates the schemas and tables, you will not need to grant explicit ownership permissions. If another user creates the schemas and tables prior to Stitch loading data into them, you will need to alter ownership to the Stitch database user. Failure to do so will result in replication issues.

Note: Stitch uses table comments to store Primary Key information. Altering or removing table comments will lead to replication issues.

SELECT ON ALL TABLES IN information_schema

Required to select rows from tables in the information_schema schema. Prior to loading data, Stitch will use the data in this schema to verify the existence and structure of integration schemas and tables.

Note: Stitch will only ever read data from systems tables.

SELECT ON ALL TABLES IN pg_catalog

Required to select rows from tables in the pg_catalog schema. Prior to loading data, Stitch will use the data in this schema to verify the existence and structure of integration schemas and tables.

Note: Stitch will only ever read data from systems tables.


Step 8: Connect Stitch

To complete the setup, you need to enter your Redshift connection details into the Destination Settings page in Stitch.

Step 8.1: Locate the Redshift connection details

  1. On the Redshift Dashboard page in the AWS Console, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you created. This will display the Cluster Details page.

  3. Locate the following fields:
    • Endpoint
    • Database Name: This is the name of the database you want to connect to Stitch. This must be a pre-existing database - Stitch won’t create one for you. It could be:
      • The name of the database you used to launch the instance.
      • The name of another database within Redshift.
    • Master Username: You’ll only need this if you’re using the master user’s credentials to connect to Stitch.
    • Port: This is the port used by the database.

These fields are highlighted in the image below:

Redshift cluster details

Step 8.2: Enter connection details into Stitch

  1. If you aren’t signed into your Stitch account, sign in now.
  2. Click the Destination tab.
  3. Click the Redshift icon.
  4. Fill in the fields as follows:

    • Host (Endpoint): Paste the Endpoint address from the Redshift Details page in AWS into this field.

      Don’t include the port number, which is appended to the end of the endpoint string - this will cause errors. For example: if you used the default port, you’d remove the following from the address: :5439

    • Port: Enter the port used by the Redshift instance. The default is 5439.

    • Username: Enter the Stitch Redshift user’s username.

    • Password: Enter the password associated with the Stitch Redshift user.

    • Database: Enter the name of the Redshift database that you created for Stitch.

Lastly, you’ll fill in the SSH fields with the bastion details.

  1. Click the Encryption Type menu.
  2. Select SSH Tunnel.
  3. The following fields will display - fill them in as follows:

    • Remote Address: Enter the bastion box’s IP address. This is the IPv4 CIDR you retrieved in Step 5.
    • SSH Port: Enter the bastion box’s port. If you’re using the default, this will be 22.
    • SSH User: Enter the Stitch Linux (SSH) user’s username.

When finished, click Update Redshift Settings.

Stitch will perform a connection test to the Redshift database; if successful, a Success! mesage will display at the top of the screen. Note: This test may take a few minutes to complete.


Troubleshooting connection errors

If you receive an error message, we recommend trying these troubleshooting steps for common destination connection problems before reaching out to support.

We’ve found that the majority of the time, these resources can resolve most issues.


Learning about Stitch and Redshift

Now that your Redshift data warehouse is up and running, we recommend learning about how Stitch will load your data into it.

  • Redshift Destination Overview: This overview will give you the high-level basics for working with your Redshift data warehouse.

  • Destination Data Loading Guide: This article covers many of the common scenarios Stitch will encounter when loading data into your Redshift destination and the expected behavior for each scenario.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.