Snowflake is a SQL data warehouse built from the ground up for the cloud, designed with a patented new architecture to handle today’s and tomorrow’s data and analytics.
A fully-managed SaaS data warehouse solution, Snowflake runs on Amazon Web Services cloud infrastructure: AWS EC2 virtual compute instances are used for compute needs, while S3 is utilized for persistent data storage.
This guide serves as a reference for version 1 of Stitch’s Snowflake destination.
Details and features
High-level details about Stitch’s implementation of Snowflake, such as supported connection methods, availability on Stitch plans, etc.
|Stitch plan availability||
All Stitch plans
|Connect API availability||
This version of the Snowflake destination can be created and managed using Stitch’s Connect API. Learn more.
Stitch does not support using SSH tunnels to connect to Snowflake destinations.
Stitch will attempt to use SSL to connect by default. No additional configuration is needed.
Virtual Private Network (VPN) connections may be implemented as part of an Enterprise plan. Contact Stitch Sales for more info.
|Default loading behavior||
|Nested structure support||
Details about the destination, including object names, table and column limits, reserved keywords, etc.
Note: Exceeding the limits noted below will result in loading errors or rejected data.
|Maximum record size||
|Table name length||
|Column name length||
|Maximum table size||
|Maximum tables per database||
Refer to the Reserved keywords documentation.
Snowflake pricing is based on two factors: The volume or data stored in your Snowflake destination and the amount of compute usage (the time the server runs) in seconds.
Snowflake offers two types of plans, each with varying levels of access and features. There are On Demand plans which are commitment-free and usage-based. The alternative is a Capacity option, which guarantees secure price discounts. Learn more about Snowflake plans and pricing here.
Snowflake warehouse sizes
Snowflake data warehouses can be different sizes - X-Small, Large, and 3X-Large, for example - which defines how many servers will comprise each cluster in a warehouse.
While the size of a warehouse can impact the time required to execute queries, bigger doesn’t always mean better. Warehouse size is directly tied to the number of credits used, which will directly impact your Snowflake costs. Learn more about Snowflake warehouse sizes here.
To help you select the warehouse size that fits your needs and budget, check out Snowflake’s Warehouse Considerations guide before getting started.
Automated warehouse management
To reduce usage, you can elect to automate the management of your Snowflake warehouse. This means that you can elect to suspend the warehouse when there’s no activity after a specified period of time, and then automatically resume when there is. Note that these settings apply to the entire warehouse and not individual clusters.
Enabling these settings depends on your workload and availability needs. Learn more about the Auto Suspend and Auto Resume features here.
Note: Stitch will only ever impact your Snowflake usage when loading data.
Replication process overview
A Stitch replication job consists of three stages:
Step 1: Data extraction
Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.
Step 2: Preparation
During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading. Refer to the System overview guide for a more detailed explanation of the Preparation phase.
Step 3: Loading
Stitch loads the data into Snowflake.
By default, Stitch will use Upsert loading when loading data into Snowflake.
If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.
Refer to the Understanding loading behavior guide for more info and examples.
Stitch requires Primary Keys to de-dupe incrementally replicated data. When tables are created in the destination, Stitch will apply Primary Key constraints to columns used as Primary Keys. Primary Key constraints require that column values be unique and not null.
No compatibility issues have been discovered between Snowflake and Stitch's integration offerings.
System tables and columns
Stitch will create the following tables in each integration’s dataset:
Additionally, Stitch will insert system columns (prepended with
_sdc) into each table.
Stitch converts data types only where needed to ensure the data is accepted by Snowflake. In the table below are the data types Stitch supports for Snowflake destinations, and the Stitch types they map to.
- Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
- Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Snowflake.
- Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Snowflake.
|Stitch type||Destination type||Notes|
When Stitch replicates source data containing objects or arrays, Stitch will load the data intact into a
VARIANT column. This is a Snowflake data type that can contain semi-structured data like JSON arrays and objects.
You can then use Snowflake’s functions for semi-structured data to parse the data. Refer to Snowflake’s documentation for more info.
Column names in Snowflake:
- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (
- Must begin with a letter or underscore
- Must be less than the maximum length of 251 characters. Columns that exceed this limit will be rejected by Snowflake.
- Must not be prefixed or suffixed with any of Snowflake’s or Stitch’s reserved keyword prefixes or suffixes
Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Snowflake:
|Transformation||Source column||Destination column|
|Convert lowercase and mixed case to uppercase||
|Remove special characters||
|Remove non-letter leading characters||
Snowflake will store the value as
TIMESTAMP_TZ(9) and express it as UTC.
Not sure if Snowflake is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.