Snowflake is a SQL data warehouse built from the ground up for the cloud, designed with a patented new architecture to handle today’s and tomorrow’s data and analytics.
A fully-managed SaaS data warehouse solution, Snowflake runs on Amazon Web Services cloud infrastructure: AWS EC2 virtual compute instances are used for compute needs, while S3 is utilized for persistent data storage.
In this guide, we'll walk you through:
- Snowflake's pricing model,
- Some high-level limitations (including any incompatible data sources),
- How to spin up a Snowflake data warehouse of your own, and
- How Stitch loads and organizes data in Snowflake.
Snowflake pricing is based on two factors: the volume or data stored in your Snowflake destination and the amount of compute usage (the time the server runs) in seconds.
Snowflake offers two types of plans, each with varying levels of access and features. There are On Demand plans which are commitment-free and usage-based. The alternative is a Capacity option, which guarantees secure price discounts. Learn more about Snowflake plans and pricing here.
Snowflake Warehouse Sizes
Snowflake data warehouses can be different sizes - X-Small, Large, and 3X-Large, for example - which defines how many servers will comprise each cluster in a warehouse.
While the size of a warehouse can impact the time required to execute queries, bigger doesn’t always mean better. Warehouse size is directly tied to the number of credits used, which will directly impact your Snowflake costs. Learn more about Snowflake warehouse sizes here.
To help you select the warehouse size that fits your needs and budget, check out Snowflake’s Warehouse Considerations guide before getting started.
Automated Warehouse Management
To reduce usage, you can elect to automate the management of your Snowflake warehouse. This means that you can elect to suspend the warehouse when there’s no activity after a specified period of time, and then automatically resume when there is. Note that these settings apply to the entire warehouse and not individual clusters.
Enabling these settings depends on your workload and availability needs. Learn more about the Auto Suspend and Auto Resume features here.
Additionally, note that Stitch will only ever impact your Snowflake usage when loading data.
Every database has its own supported limits and way of handling data, and Snowflake is no different. The table below provides a very high-level look at what Snowflake supports, including any possible incompatibilities with Stitch’s integration offerings.
Snowflake’s full list; Stitch reserves
|Table Name Length||255 characters||Column Name Length||251 characters|
|Max # of Columns||None||VARCHAR Max Width||16MB|
|Nested Structure Support||Native Support||Case||Case Insensitive|
Not sure if Snowflake is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
Using the SQL Worksheet feature in Snowflake’s web interface, you can spin up a Snowflake data warehouse for Stitch in just a few minutes.
After you’ve successfully connected your Snowflake data warehouse to Stitch, you can start adding integrations and replicating data.
For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. This is where all the tables for that inegration will be stored.
Stitch will encounter dozens of scenarios when replicating and loading your data. To learn more about how Snowflake handles these scenarios, check out the Data Loading Guide for Snowflake.
Rejected Records Log
Occasionally, Stitch will encounter data that it can’t load into the data warehouse. For example: a table contains more columns than Snowflake’s allowed limit of None columns per table.
When this happens, the data will be “rejected” and logged in a table called
_sdc_rejected. Every integration schema created by Stitch will include this table as well as the other tables in the integration.
Stitch replicates data from your sources based on the integration’s Replication Frequency and the Replication Method used by the tables in the integration. In Stitch, you have the ability to control what and how often data is replicated for the majority of integrations.
The time from the sync start to data being loaded into your data warehouse can vary depending on a number of factors, especially for initial historical loads.
To learn more about Stitch’s replication process and how loading time can be affected, check out the Stitch Replication Process guide.