In this tutorial, we covered how to spin up a Redshift cluster with a public IP address to use with Stitch. If your Redshift cluster is in a private subnet, however, you can use an SSH tunnel to connect Stitch to your data warehouse.

The method you’ll use to accomplish this involves launching an EC2 instance into the VPC associated with the cluster. This EC2 instance will have a public IP address and act as a bastion, allowing you to forward traffic through an encrypted tunnel to your private Redshift cluster.

Redshift Setup Requirements

To set up Redshift in Stitch, you need:

  • An Amazon Web Services (AWS) account. Signing up is free - click here or go to https://aws.amazon.com to create an account if you don’t have one already.

  • Some technical know-how and familiarity with AWS.

  • Optional: Create a non-default VPC. A VPC, or Virtual Private Cloud, is a scalable virtual network that resembles a traditional network used in a data center.

    All AWS accounts created after December 3, 2014, include a default VPC in each AWS region. This means that you can use the default VPC to launch your Redshift cluster instead of spinning up your own.

    Steps for creating a VPC are outside the scope of this tutorial, so the examples in this guide will use a default VPC.

    This is optional, but if you want to use your own VPC, you need to create it before continuing. Amazon’s Getting Started with VPC guide can walk you through how to do this.

Additionally, note the following before you get started:

  • This tutorial assumes you’ll be using a Linux-based server to launch your instance and create the SSH tunnel.

  • An SSH tunnel isn’t necessarily more secure than a direct connection. An SSH tunnel is only as secure as the monitoring and hardening you perform on the SSH server hosting the tunnel.

If you have questions or concerns about Stitch security, please refer to the Security FAQ.


Step 1: Create cluster login credentials

To get started, sign into your AWS account and navigate to the Redshift Dashboard page. As of December 13, 2016, you can get there by clicking Services (top left corner), then clicking Redshift in the Databases section.

  1. On the Redshift Dashboard page, click the Lauch Cluster button.
  2. The Cluster Details page will display.
  3. Fill in the cluster details:
    • Cluster Identifier: This is the unique key that defines a cluster.
    • Database Name: This is the initial database for the cluster.
    • Port: A port is an opening in a firewall that allows in and outbound traffic. By default, Redshift uses port 5439.
  4. Next, you’ll create the master login credentials. You can use these credentials to connect Stitch to your Redshift cluster if you like, or if you want to be more selective with the permissions, you can create an additional user for Stitch later on in the tutorial.
    • Master User Name: This is the name of the master user for the cluster.
    • Master User Password: This is the password associated with the cluster’s master user.
  5. Re-enter the master user’s password to confirm it.
  6. Click Continue.

Step 2: Select nodes and cluster types

On the Node Configuration page, you’ll select the node type, define the cluster type, and decide how many nodes to use.

A node is a single computer that participates in a cluster. A Redshift cluster can have one to many nodes; the more nodes, the more data it can store and the faster it can process queries. Amazon currently offers four different types of nodes, each of which has its own CPU, RAM, storage capacity, and storage drive type. In addition, each node also has its own pricing plan.

The type and number of node(s) you choose when creating your cluster is dependent on your needs and dataset. We do, however, recommend you set up a multi-node configuration to provide data redundancy. For some guidance on choosing the number of nodes, click here to read the ‘Determining the Number of Nodes’ section of the Amazon Redshift guide.
  1. Select the Node Type you want to use.
  2. Select the Cluster Type.
  3. Enter the number of Compute Nodes you want to use.
  4. Click Continue.

Step 3: Configure and launch the Redshift cluster

Next, you’ll define the additional configuration settings for the Redshift cluster. For our purposes, we’re going to leave most of the settings as-is and focus on the Configure Networking Options and Security Groups sections.

Step 3.1: Configure networking options

In the Configure Networking Options section, define these fields:

  • Choose a VPC: From the drop-down, select the VPC you want to launch the cluster in. This will also cause the other required fields to display.

  • Cluster Subnet Group: Select a Subnet Group to associate with the cluster. This can be public or private.

    The default VPC included with all new AWS accounts also includes a default public subnet, which is what we’re using here.

  • Publicly Accessible: Set this to No.

Step 3.2: Define a Security Group

The last step to configuring your cluster is to define the Security Group associated with the cluster.

In the VPC Security Groups section, select the Security Group you want to associate with the cluster.

We’re using the default Security Group that’s included with all new AWS accounts, but you can choose any one you like. You’ll edit the access rules for the Security Group in the next section, which will allow the group to access the cluster.

Here’s what the configuration looks like for our sample cluster:

Redshift Additional Configuration page.

When finished, click the Continue button.

Step 3.3: Review and launch the Redshift cluster

A review page containing all the details about your cluster (except for the master password) will display after you click Continue on the Additional Configuration page.

Additionally, the pricing info for the cluster will display at the bottom of the page.

After you’ve reviewed and confirmed the settings, click Launch Cluster to launch the cluster. Note that this process can take a few minutes - the status in the Cluster Dashboard page will change to ‘available’ when the process completes.


Step 4: Create a bastion in your VPC

Next, you’ll launch an EC2 instance to serve as the SSH bastion. This publicly accessible instance will act as an intermediary, forwarding the traffic from Stitch through an encrypted tunnel to your private Redshift cluster.

Note: This instance must reside in the same VPC as the Redshift cluster.

Step 4.1: Configure the EC2 instance

The first part of creating a bastion in your VPC is configuring the instance.

  1. Navigate to the VPC Management Console in AWS. If you use the Services menu (top left corner), click the VPC option under the Networking & Content Delivery section.
  2. On the VPC Dashboard, click the Launch EC2 Instances button.
  3. Next, you’ll be asked to select the Amazon Machine Image, or AMI, that will be used to launch the instance.

    We’ll be using a Linux-based AMI (like Ubuntu) for this tutorial:

    Ubuntu Amazon Machine Image option in AWS

    Click the Select button next to the AMI you want to use.

  4. On the next page, you’ll select the instance type. Generally, a small instance will work just fine as a bastion. For example: t2.medium. You can find more info about instance types on Amazon’s website.

    After you select the instance type, click the Configure Instance Details button in the lower right corner of the page to continue.

  5. On the Configure Instance Details page, fill in the following fields:
    • Network - Select the same VPC that you launched your Redshift cluster in. This was what you selected for the Choose a VPC field in Step 3.
    • Subnet - Select the subnet you want to associate with the EC2 instance. We recommend using a public subnet - that is, a subnet with an Internet Gateway - as this will automatically assign a public IP address to machines in the subnet.

      If you’re not sure what subnet to use, simply leave this as the default.

    • Auto-assign public IP - Select Enable from the drop-down. This should be enabled to ensure your machine has a public IP address.

    Here’s a look at our setup:

    Configuring the EC2 Instance Details

  6. Click the Next: Add Storage button in the lower right corner of the page to continue.
  7. If you’re only using the machine as a bastion (which is what we’re doing in this tutorial), adding storage and tags are unnecessary. Skip over these pages until you reach the Configure Security Group page.

Step 4.2: Configure the EC2 instance's Security Group

The second part of creating a bastion in your VPC is configuring the security group. During this step, you’ll add the IP addresses that are allowed to access the bastion to the security group.

You can create a new Security Group or update an existing one. For this tutorial, we’ll create a new group.

  1. Select the Create a new security group option.
  2. In the Security group name field, enter Stitch Bastion or a unique name for the Security Group.
  3. In the table below the Description field, you’ll add Stitch’s IP addresses to the security group:
    • Type: If the default SSH port for your server is 22, set this to SSH.

      If it’s something else, set this to Custom TCP Rule.

    • Protocol: This will default to TCP - leave it as-is.
    • Port Range: This is the number of the SSH port associated with the bastion. If you selected SSH as the Type, this will default to 22.

      If you selected Custom TCP Rule, enter the number of the SSH port in this field.

    • Source: This should default to Custom. In the field next to the Source drop-down menu, paste one of the following IP addresses:

      • 52.23.137.21/32

      • 52.204.223.208/32

      • 52.204.228.32/32

      • 52.204.230.227/32

  4. Click the Add Rule button to add another rule.
  5. Repeat steps 3 and 4 until all of Stitch’s IP addresses have been added to the Security Group.

    Here’s what the Security Group rules should look like:

    Configuring the EC2 Instance Security Group

  6. When finished, click the Review and Launch button in the lower right corner of the page.

Step 4.3: Review and launch the EC2 instance

The last step is to review the settings for the EC2 instance and launch it.

Review the instance’s settings, paying particular attention to the fields highlighted in the image below:

Reviewing the EC2 Instance Details

  • The Security Groups section should list either a new Security Group for Stitch OR an existing group that contains group rules for Stitch’s IP addresses. If it doesn’t, refer to the Configure the EC2 Instance’s Security Group for instructions.
  • In the Instance Details section:
    • Network: This field should contain the ID of the same VPC that you launched the Redshift cluster in Step 3 of this tutorial.
    • Assign Public IP & Assign IPv6 IP: We strongly recommend using a public subnet with the instance and auto-assigning a public IP address. This will ensure that Stitch can access the instance.

After you’re reviewed the instance’s settings, click the Launch button in the lower right corner to launch the instance.

Note that it may take a few minutes for the instance creation process to complete. The status in the VPC Dashboard page will change to ‘available’ when the instance is ready.


Step 5: Enable the bastion to access the Redshift cluster

After the EC2 instance has finished initializing, you can move onto configuring the access rules for Redshift cluster. In this section, you’ll create a VPC Security Group that will forward traffic from the bastion (EC2 instance) to your private Redshift cluster.

Step 5.1: Retrieve the VPC's IPv4 CIDR

In this step, you’ll retrieve the bastion’s IP address, or IPv4 CIDR. This value will be followed by a slash and a number between 0 and 32. For example: 10.0.0.0/16

  1. Navigate to the VPC Management Console in AWS. If you use the Services menu (top left corner), click the VPC option under the Networking & Content Delivery section.
  2. On the VPC Dashboard, click the Your VPCs option under Virtual Private Cloud in the menu on the left side of the page.
  3. A list of all the VPCs you have access to in your AWS account will display. Locate the one you launched your Redshift cluster and the bastion in.
  4. Locate the IPv4 CIDR column.

    If this column isn’t in the table, simply click on the VPC to open its details in the bottom section of the page:

    VPC details & IPv4 CIDR

  5. Copy and paste the VPC’s IPv4 CIDR value somewhere convenient - you’ll need it in the next step.

Step 5.2: Create a VPC Security Group

Now that you’ve retrieved the bastion’s IP address, you can create a security group that will allow traffic from the bastion to access the Redshift cluster.

  1. From the VPC page, click the Security Groups option under Security in the menu on the left side of the page.
  2. Click the Create Security Group button.
  3. In the Create Security Group window:
    • Name tag: Enter a name tag if you want; otherwise, leave blank.
    • Group name: Enter Stitch, or a unique name for the Security Group.
    • Description: Enter a brief description of what the group is.
    • VPC: Verify that the VPC where you launched the Redshift cluster and bastion is selected in the drop-down.
  4. Click Yes, Create.
  5. After the Security Group is created, locate it in the list of Security Groups.
  6. Click on the Security Group.
  7. In the bottom section of the page - where the Security Group’s details are displayed - click the Inbound Rules tab.
  8. Click the Edit button to create an Inbound rule for the Security Group:
    • Type: Select Custom TCP Rule.
    • Protocol: This should default to TCP - leave it as-is.
    • Port Range: Enter the port used by the Redshift instance. Our instance is using the Redshift default (5439).
    • Source: Enter the bastion’s VPC IPv4 CIDR. Ex: 10.0.0.0/16

    Here’s what the Inbound rule should look like:

    VPC inbound Security Group rule

  9. When finished, click Save to create the rule.

Step 6: Create a Stitch Linux user

Next, you’ll retrieve your Public Key and create a Linux user on your server for Stitch.

Step 6.1: Retrieve your Public Key

The Stitch Public Key

The Public Key is used to authorize the Stitch Linux user. If the key isn’t properly installed, Stitch will be unable to access your database.

To retrieve the key:

  1. Sign into your Stitch account.

  2. Click the Destination tab.
  3. Click the Redshift icon.

  4. When the credentials page displays, click the Encryption Type menu and select the SSH Tunnel option.

  5. The Public Key will display, along with the other SSH fields.

Leave this page open for now - you’ll need it to wrap things up at the end.

Step 6.2: Create the Stitch Linux user

Note: Anything inside square brackets - [like this] - is something you need to define when running the commands yourself.

  1. To create the new user, run the following commands as root on your Linux server:

    adduser --disabled-password [stitch_username]
    mkdir /home/[stitch_username]/.ssh
    
  2. Next, import the Public Key into authorized_keys. This will ensure the Stitch user has access to the database.

    Copy the entire key into the authorized_keys file by:

    "[PASTE KEY HERE]" >> /home/[stitch_username]/.ssh/authorized_keys
    
  3. Alter the permissions on the /home/[stitch_username] directory to allow access via SSH:

    chown -R [stitch_username]:[stitch_username] /home/[stitch_username]
    chmod -R 700 /home/[stitch_username]/.ssh
    

Step 7: Create a Stitch Redshift User

As we mentioned when you created the instance, the credentials used to launch belong to the master user, or admin.

If you want to control the permission settings, you’ll need to create an additional database user for Stitch. An added benefit is that Stitch will be easily distinguishable in any logs or audits.

Requirements for creating a Redshift user

You must be a superuser to create a Redshift user. If you’re not a superuser or you’re not sure what this means, please contact someone on your technical team before proceeding.

Required Redshift permissions for Stitch

The Stitch database user requires the following to successfully load data into Redshift:

Action Reason
CREATE schemas, tables, and views Stitch requires these permissions to create the necessary database objects to load and store your data.
SELECT from tables in the pg_catalog and information_schema schemas Stitch uses the data in the systems tables to verify the existence of integration schemas and tables, and gain information about their structure prior to loading data.

Note: Stitch will only ever read data from the systems tables.
Ownership of all integration schemas and tables Stitch occasionally needs to run COMMENT and ALTER TABLE commands to properly insert data.

While you technically can manually create integration schemas in Redshift and then grant Stitch access to those schemas, we don’t recommend it. We understand that security is a high priority and permissions should be carefully granted, but this approach will likely lead to headaches for you.

If you take this approach, you’ll need to first create the schema in your data warehouse and then grant the Stitch user the appropriate permissions to access it every time a new integration is added in Stitch. If this isn’t done, Stitch won’t be able to load data into your data warehouse.

Additionally, if there are any differences between the name of the schema you create in your data warehouse and the schema name you enter in Stitch when setting up an integration, Stitch won’t be able to load data into your data warehouse.

Create the database user

Depending on the type of connection this is, the steps for creating a Redshift database user for Stitch will vary a bit:

  • If Stitch hasn’t ever replicated data to your Redshift destination, follow the instructions in the NEW Connections tab.
  • If Stitch has replicated data to your Redshift destination in the past, follow the instructions in the EXISTING Connections tab.

Note: Anything inside square brackets - [like this] - is something you need to define when running the commands yourself.

  1. If you haven’t already, connect to your Redshift instance using your SQL client.2. After connecting, run this command to create the user:

    CREATE USER [stitch_username] WITH PASSWORD '[password]';
    
  2. Next, you’ll assign the CREATE permissions to the user. For [database_name], enter the name of the database where all Stitch-replicated data should be loaded. If you launched a cluster using this tutorial, it’s the name of the database when you launched the cluster in Step 2.

    Note: This must be a pre-existing database.

    GRANT CREATE ON DATABASE[database_name] TO [stitch_username];
    
  3. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions:

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO [stitch_username]
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO [stitch_username]
    

Note: Anything inside square brackets - [like this] - is something you need to define when running the commands yourself.

  1. If you haven’t already, connect to your Redshift instance using your SQL client.2. After connecting, run this command to create the user:

    CREATE USER [stitch_username] WITH PASSWORD '[password]';
    
  2. Next, you’ll assign the CREATE permissions to the user. For [database_name], enter the name of the database where all Stitch-replicated data should be loaded. If you launched a cluster using this tutorial, it’s the name of the database when you launched the cluster in Step 2.

    Note: This must be a pre-existing database.

    GRANT CREATE ON DATABASE[database_name] TO [stitch_username];
    
  3. If you restricted access to the system tables, you’ll also need to run the following commands to grant the Stitch user SELECT permissions:

    GRANT SELECT ON ALL TABLES IN SCHEMA information_schema TO [stitch_username]
    
    GRANT SELECT ON ALL TABLES IN SCHEMA pg_catalog TO [stitch_username]
    

Transfer integration schema and table ownership

When an integration is initially connected to Stitch, a schema specific to that connection is created in your data warehouse. Ownership to those schemas and all the tables contained within them must be granted to the Stitch user for data replication to be successful.

  1. Query the catalog tables using this statement to list all tables and their owners:

    SELECT * 
    FROM pg_catalog.pg_tables;
    

    Take note of any integration tables that aren’t owned by the Stitch user - those are the tables you’ll need to update.

  2. Transfer ownership of the integration schema to the Stitch user:

    ALTER SCHEMA [schema_name] OWNER TO [stitch_username];
    
  3. Next, you’ll transfer ownership of all tables in the integration schema to the Stitch user. There are two ways to accomplish this: table-by-table or running a script.

    Table-by-Table

    For every table in the schema not owned by the Stitch user - as outlined in step 1 - use this command to transfer ownership:

    ALTER TABLE [table_name] OWNER TO  [stitch_username];
    
    Script

    To transfer ownership automatically, run this script from the command line:

    for table in `psql -qAt --host [your.redshift.amazonaws.com] --port [port] --user [admin_user] -c "select tablename from pg_tables where schemaname = '[schema_name]';" [database_name]` ; do psql -qAt --host [your.redshift.amazonaws.com] --port [port] --user [admin_user] -c "alter table \"[schema_name]\".\"$table\" owner to [stitch_username];" [database_name] ; done
    

Note: If you need to transfer ownership of more than one schema and its tables, you’ll need to repeat steps 2 and 3 for every integration schema.


Step 8: Connect Stitch

To complete the setup, you need to enter your Redshift connection details into the Destination Settings page in Stitch.

Step 8.1: Locate the Redshift connection details

  1. On the Redshift Dashboard page in the AWS Console, click the Clusters option in the left nav bar.
  2. In the Clusters page, locate and click on the cluster you created. This will display the Cluster Details page.

  3. Locate the following fields:
    • Endpoint
    • Database Name: This is the name of the database you want to connect to Stitch. This must be a pre-existing database - Stitch won’t create one for you. It could be:
      • The name of the database you used to launch the instance.
      • The name of another database within Redshift.
    • Master Username: You’ll only need this if you’re using the master user’s credentials to connect to Stitch.
    • Port: This is the port used by the database.

These fields are highlighted in the image below:

Redshift cluster details

Step 8.2: Enter connection details into Stitch

  1. If you aren’t signed into your Stitch account, sign in now.
  2. Click the Destination tab.
  3. Click the Redshift icon.
  4. Fill in the fields as follows:

    • Host (Endpoint): Paste the Endpoint address from the Redshift Details page in AWS into this field.

      Don’t include the port number, which is appended to the end of the endpoint string - this will cause errors. For example: if you used the default port, you’d remove the following from the address: :5439

    • Port: Enter the port used by the Redshift instance. The default is 5439.

    • Username: Enter the Stitch Redshift user’s username.

    • Password: Enter the password associated with the Stitch Redshift user.

    • Database: Enter the name of the Redshift database that you created for Stitch.

Lastly, you’ll fill in the SSH fields with the bastion details.

  1. Click the Encryption Type menu.
  2. Select SSH Tunnel.
  3. The following fields will display - fill them in as follows:

    • Remote Address: Enter the bastion box’s IP address. This is the IPv4 CIDR you retrieved in Step 5.
    • SSH Port: Enter the bastion box’s port. If you’re using the default, this will be 22.
    • SSH User: Enter the Stitch Linux (SSH) user’s username.

When finished, click Update Redshift Settings.

Stitch will perform a connection test to the Redshift database; if successful, a Success! mesage will display at the top of the screen. Note: This test may take a few minutes to complete.


Troubleshooting connection errors

If you receive an error message, we recommend trying these troubleshooting steps for common destination connection problems before reaching out to support.

We’ve found that the majority of the time, these resources can resolve most issues.


Learning about Stitch and Redshift

Now that your Redshift data warehouse is up and running, we recommend learning about how Stitch will load your data into it.

  • Redshift Destination Overview: This overview will give you the high-level basics for working with your Redshift data warehouse.

  • Destination Data Loading Guide: This article covers many of the common scenarios Stitch will encounter when loading data into your Redshift destination and the expected behavior for each scenario.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.