Connecting a Databricks Delta Lake on AWS Destination to Stitch

Prerequisites

An Amazon Web Services (AWS) account with a Databricks Delta Lake (AWS) deployment. Instructions for configuring a Databricks Delta Lake (AWS) deployment are outside the scope of this tutorial; our instructions assume that you have Databricks Delta Lake (AWS) up and running. Refer to Databricks’ documentation for help configuring your AWS account with Databricks.
An existing Amazon S3 bucket that must be:
- In the same region as your Stitch account. For example: If your Stitch account uses the North America (us-east-1) data pipeline region, your S3 bucket must also be in us-east-1. Here’s how to verify your Stitch account’s data pipeline region.
- In the same AWS account as the Databricks deployment or have a cross-account bucket policy that allows access to the bucket from the AWS account with the Databricks deployment.
Permissions to manage S3 buckets in AWS. Your AWS user must be able to add and modify bucket policies in the AWS account or accounts where the S3 bucket and Databricks deployment reside.

Step 1: Configure S3 bucket access in AWS

Important: The S3 bucket you use must be in the same region as your Stitch account. Using a bucket in another region will result in errors in Stitch.

Step 1.1: Grant Stitch access to your Amazon S3 bucket
Step 1.2: Grant Databricks access to your Amazon S3 bucket

Step 1.1: Grant Stitch access to your Amazon S3 bucket

To allow Stitch to access the bucket, you’ll need to add a bucket policy using the AWS console. Follow the instructions in the tab below to add the bucket policy.

Instructions
Privileges list

Why does Stitch require these permissions? For an explanation of why Stitch requires each permission outlined here, see the Privileges list tab.

Sign into your Amazon Web Services (AWS) account as a user with privileges that allows you to manage S3 buckets.
Click Services near the top-left corner of the page.
Under the Storage option, click S3.
A page listing all buckets currently in use will display. Click the name of the bucket that is used with Databricks.
Click the Permissions tab.
In the Permissions tab, click the Bucket Policy button.

In the Bucket policy editor, paste the bucket policy for your Stitch data pipeline region and replace <YOUR-BUCKET-NAME> with the name of your S3 bucket.

Not sure what your Stitch data pipeline region is? Click here for help.

My Stitch data pipeline region is North America (us-east-1).

North America (us-east-1) bucket policy

{
  "Version": "2012-10-17",
  "Id": "",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::218546966473:role/LoaderDelta"
        ]
      },
      "Action": [
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
        ],
      "Resource": [
        "arn:aws:s3:::<YOUR_BUCKET_NAME>",
        "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
      ]
    }
  ]
}

My Stitch data pipeline region is Europe (eu-central-1).

Europe (eu-central-1) bucket policy

{
  "Version": "2012-10-17",
  "Id": "",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::218546966473:role/LoaderDelta_eu_central_1"
        ]
      },
      "Action": [
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
        ],
      "Resource": [
        "arn:aws:s3:::<YOUR_BUCKET_NAME>",
        "arn:aws:s3:::<YOUR_BUCKET_NAME>/*"
      ]
    }
  ]
}

When finished, click Save.

In the table below are the database user privileges Stitch requires to connect to and load data into Databricks Delta Lake (AWS).

Privilege name	Reason for requirement
s3:DeleteObject	Required to remove obsolete staging tables during loading.
s3:GetObject	Required to read objects in an S3 bucket. Granting the `s3:GetObject` privilege in a bucket policy allows the user to perform the following operations: GET Object HEAD Object
s3:ListBucket	Required to determine if an S3 bucket exists, if access is allowed to the bucket is allowed, and to list the objects in the bucket. Granting the `s3:ListBucket` privilege in a bucket policy allows the user to perform the following operations: GET Bucket (List Objects) HEAD Bucket
s3:PutObject	Required to add objects, such as files, to an S3 bucket. Granting the `s3:PutObject` privilege in a bucket policy allows the user to perform the following operations: PUT Object POST Object Initiate Multipart Upload Upload Part Complete Multipart Upload PUT Object - Copy

Step 1.2: Grant Databricks access to your Amazon S3 bucket

Next, you’ll configure your AWS account to allow access from Databricks by creating an IAM role and policy. This is required to complete loading data into Databricks Delta Lake (AWS).

Follow steps 1-4 in Databricks’ documentation to create the IAM policy and role for Databricks.

Step 2: Configure access in Databricks

Step 2.1: Add the Databricks S3 IAM role to Databricks
Step 2.2: Create a Databricks cluster
Step 2.3: Retrieve the Databricks cluster’s JDBC URL
Step 2.4: Generate a Databricks access token

Step 2.1: Add the Databricks S3 IAM role to Databricks

Follow step 5 in this Databricks guide to add IAM role you created for Databricks in Step 1.2 to your Databricks account.

After the Databricks IAM role has been added using the Databricks Admin Console, proceed to the next step.

Step 2.2: Create a Databricks cluster

Note: You’ll need the Allow Cluster Creation privilege in Databricks to complete this step.

Sign into your Databricks account.
Click the Clusters option on the left side of the page.
Click the + Create Cluster button.
In the Cluster Name field, enter a name for the cluster.
In the Databricks Runtime Version field, select a version that’s 6.3 or higher. This is required for Databricks Delta Lake (AWS) to work with Stitch:
In the Advanced Options section, locate the IAM Role field.
In the dropdown menu, select the Databricks IAM role you added to your account in the previous step.
When finished, click the Create Cluster button to create the cluster.

Step 2.3: Retrieve the Databricks cluster's JDBC URL

Next, you’ll retrieve your Databricks’ cluster JDBC URL.

On the Clusters page in Databricks, click the cluster you created in the previous step.
Open the Advanced Options section.
Click the JDBC/ODBC tab.
Locate the JDBC URL field and copy the value:

Keep this handy - you’ll need it to complete the setup in Stitch.

Step 2.4: Generate a Databricks access token

Click the user profile icon in the upper right corner of your Databricks workspace.
Click User Settings.
Click the Access Tokens tab:
In the tab, click the Generate New Token button.

The Generate New Token window in Databricks

In the window that displays, enter the following:
- Comment: Stitch destination
- Lifetime (days): Leave this field blank. If you enter a value, your token will eventually expire and break the connection to Stitch.
Click Generate.

A newly generated access token in Databricks

Copy the token somewhere secure. Databricks will only display the token once.
Click Done after you copy the token.

Step 3: Connect Stitch

If you aren’t signed into your Stitch account, sign in now.
Click the Destination tab.
Locate and click the Databricks Delta Lake (AWS) icon.
Fill in the fields as follows:
- Display Name: Enter a display name for your destination, to distinguish various connections of the same type.
- Description (optional): Enter a description for your destination.
- Access Token: Paste the access token you generated in Step 2.4.
- JDBC URL: Paste the JDBC URL you retrieved in Step 2.3.
- Bucket Name: Enter the name of Amazon S3 bucket you configured in Step 1. Enter only the bucket name: No URLs, https, or S3 parts. For example: stitch-databricks-delta-bucket

When finished, click Check and Save.

Stitch will perform a connection test to the Databricks Delta Lake (AWS) database; if successful, a Success! message will display at the top of the screen. Note: This test may take a few minutes to complete.

Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.

Related	Troubleshooting
Choosing a Destination Destination & Integration Compatibility Loading Data into Your Destination Switching Destinations	Destination Connection Errors