tl;dr We believe that transformation is best done in your cloud data warehouse, so we focus on E and L.
A few years ago, the only data warehouses available were expensive, on-premises appliances, and it took weeks or months for organizations to add additional capacity. In that world, it made sense to do extract, transform, and load in that order. ETL tools that were built 10 or more years ago were set up to do as much prep work as possible, including transformation, prior to loading data into data warehouses. Today, however, cloud data warehouses like Amazon Redshift, Google BigQuery, Microsoft Azure, and Snowflake can elastically scale up and down in seconds or minutes, so you can skip the preload transformations and dump all of your raw data into your data warehouse. You can then define transformations in SQL and run them in the data warehouse at query time.
You may also be interested in a blog post our CTO wrote on this subject.
Some of the most common reasons our customers choose Stitch are:
Our self-serve plans are tiered by data volume. In all of our plans, you can use as many of our integrations as you like at no extra cost. If you're already using Stitch for one data source, we encourage you to add more.
Our Enterprise plans are custom-built based on the needs of your organization. If you’re interested in an enterprise-grade ETL platform for your mission-critical data, please contact our Sales team for more details.
Within a small number of minutes. If you need a data latency SLA, please contact our Sales team for more details.
You can specify the Replication Frequency on an integration-by-integration basis, which determines how often Stitch will attempt to extract data from an each data source.
There are three paths for adding new integrations. If you need an integration for a new data source immediately, you can build integrations using the open source Singer framework, and they'll run in Stitch; check out the Singer Getting Started guide, and bring any questions you have to the Singer Slack group. You can also work with one of our implementation partners, which are experienced in building custom integrations for use with Stitch. Finally, we can include custom integration development and commercial support for community-developed integrations for Enterprise customers.
Singer is an open source platform that lets anyone write and collaborate on scripts that move data between databases, web APIs, file queues, and just about anything else you can think of. You can submit Singer integrations to our Product team for inclusion in Stitch; once accepted, you can use Stitch to run any integration written in the Singer format. By running a Singer integration on Stitch's platform you get auto-scaling, a secure infrastructure, credential management, monitoring, and alerting. Singer integrations can also be run on hardware that you manage.
Singer is made up of three parts:
All taps and targets can be mixed and matched, so changing the destination you're loading data into is easy. Since it's all open source, community members can leverage each other's improvements.
Stitch is architected to prevent data loss or duplication. We buffer data once it's in the pipeline, so if a data warehouse gets disconnected, nothing will be lost as long as it's reconnected before the buffer expires. Most customers have a two-week buffer; Enterprise customers can define custom data retention policies and expiration intervals.
How Stitch handles data structure changes in a data source varies depending on the integration, as well as the replication method used for a given table within that integration.
Stitch codes schema definitions for most SaaS integrations based on the API documentation for those data sources. Changes to that structure would require redevelopment of that integration within Stitch. We do this development on behalf of all our customers when one of our Stitch-certified data sources deprecates an old version of its API.
For database integrations, and some SaaS integrations that support custom fields, such as Salesforce and Zuora, we interpret the schema using the systems tables in the source instance. At extraction time, Stitch first performs a "structure sync," during which we detect the structure of the source instance and persist that information to the Tables to Replicate page for your integration.
From there, the way we handle structural changes is influenced by the replication method defined for a given table. For tables that use our key-based incremental replication methodwe can make changes in the destination based on the structure changes. We will append new columns, split the destination columns to accommodate new data types, or no longer load data to a column if it has been removed from the source, as explained in our data loading guide for each destination.
We don't currently support structural changes to tables that use our log-based incremental replication method. We use JSON Schema validation during extraction to make sure our customers' data is always loaded accurately, and several database binary logs don't include columnar information in the log files we read from. A change in the schema for one of these tables generally leads to critical errors during extraction, and requires a full re-replication of the source table before replication for that integration can proceed. This limitation of log-based replication is explained in more detail in our documentation.
Both Certified and Community integrations offer a number of benefits:
The key difference is that Stitch provides commercial support for Certified integrations but not for Community integrations – though support for Community integrations can be included in Enterprise contracts. Commercial support is a guarantee that the Stitch team will fix bugs and adapt to new versions of third-party APIs. Maintenance of Community integrations is handled by members of the Singer open source community.
Sign up, add a data source, a destination, and you’re ready to go. We offer an unlimited 14-day trial, so feel free to connect all of your systems to Stitch. Our Getting Started guide walks you through the process, and our Support team can help out with any bumps you hit along the way.