Q: Where's the T in your ETL tool?tl;dr We believe that transformation is best done in your cloud data warehouse, so we focus on E and L.
A few years ago, the only data warehouses available were expensive, on-premises appliances, and it took weeks or months for organizations to add additional capacity. In that world, it made sense to do extract, transform, and load in that order. ETL tools that were built 10 or more years ago were set up to do as much prep work as possible, including transformation, prior to loading data into data warehouses. Today, however, cloud data warehouses like Amazon Redshift, Google BigQuery, Microsoft Azure, and Snowflake can elastically scale up and down in seconds or minutes, so you can skip the preload transformations and dump all of your raw data into your data warehouse. You can then define transformations in SQL and run them in the data warehouse at query time.
You may also be interested in a blog post our CTO wrote on this subject.
Q: How does Stitch compare to [another ETL platform]?
Some of the most common reasons our customers choose Stitch are:
- Stitch data integrations are powered by Singer, an open source standard the company developed for writing scripts that move data. Singer integrations are free for anyone to use, regardless of whether they're a Stitch customer, and Singer’s extensibility allows Stitch to connect to any data source.
- Stitch is built to be self-service – you can get started without talking to a salesperson. Most of our clients don't need to interact with our team to get set up or manage their data pipeline. But if they do ...
- We have an experienced and responsive Support team. We have in-app chat and world-class documentation to help clients who have questions. We can also provide phone support and SLAs to customers as part of an Enterprise contract.
- Our team is very experienced at building and maintaining data pipelines. RJMetrics, the company Stitch was spun out of, has been in operation since 2008. In that time we've built and rebuilt dozens of data pipelines, integrations, and data loaders.
- We have simple, transparent pricing plans based on the number of rows loaded and an enterprise plan for companies looking for advanced features.
Q: Do you charge per integration, by volume of rows/events, or some other way?
Our self-serve plans are tiered by data volume. In all of our plans, you can use as many of our integrations as you like at no extra cost. If you're already using Stitch for one data source, we encourage you to add more.
Our Enterprise plans are custom-built based on the needs of your organization. If you’re interested in an enterprise-grade ETL platform for your mission-critical data, please contact our Sales team for more details.
Q: How quickly will data be available in my data warehouse?
Within a small number of minutes. If you need a data latency SLA, please contact our Sales team for more details.
Q: How does Stitch determine when to replicate my data?
You can specify the Replication Frequency on an integration-by-integration basis, which determines how often Stitch will attempt to extract data from an each data source.
Q: How secure is Stitch?
A: We take security very seriously – see our security page and, for more details, our security FAQ.
Q: I see you support a lot of integrations – what about [integration we don’t support yet]?
There are three paths for adding new integrations. If you need an integration for a new data source immediately, you can build integrations using the open source Singer framework, and they'll run in Stitch; check out the Singer Getting Started guide, and bring any questions you have to the Singer Slack group. You can also work with one of our implementation partners, which are experienced in building custom integrations for use with Stitch. Finally, we can include custom integration development and commercial support for community-developed integrations for Enterprise customers.
Q: What is Singer?
Singer is an open source platform that lets anyone write and collaborate on scripts that move data between databases, web APIs, file queues, and just about anything else you can think of. You can submit Singer integrations to our Product team for inclusion in Stitch; once accepted, you can use Stitch to run any integration written in the Singer format. By running a Singer integration on Stitch's platform you get auto-scaling, a secure infrastructure, credential management, monitoring, and alerting. Singer integrations can also be run on hardware that you manage.
Singer is made up of three parts:
- Taps, which pull data from sources
- Targets, which send data to destinations
- A JSON-based format for communication between taps and targets
All taps and targets can be mixed and matched, so changing the destination you're loading data into is easy. Since it's all open source, community members can leverage each other's improvements.
Check out the Singer Getting Started guide, and join the Singer Slack group to get help from the community and see what other people are working on.
Q: What happens to data in the pipeline if the data warehouse gets disconnected? Could I lose data, or wind up with duplicate data when the pipeline is reconnected?
Stitch is architected to prevent data loss or duplication. We buffer data once it's in the pipeline, so if a data warehouse gets disconnected, nothing will be lost as long as it's reconnected before the buffer expires. Most customers have a two-week buffer; Enterprise customers can define custom data retention policies and expiration intervals.
Q: How does Stitch handle replication when a data source changes its schema?
How Stitch handles data structure changes in a data source varies depending on the integration, as well as the replication method used for a given table within that integration.
Stitch codes schema definitions for most SaaS integrations based on the API documentation for those data sources. Changes to that structure would require redevelopment of that integration within Stitch. We do this development on behalf of all our customers when one of our Stitch-certified data sources deprecates an old version of its API.
For database integrations, and some SaaS integrations that support custom fields, such as Salesforce and Zuora, we interpret the schema using the systems tables in the source instance. At extraction time, Stitch first performs a "structure sync," during which we detect the structure of the source instance and persist that information to the Tables to Replicate page for your integration.
From there, the way we handle structural changes is influenced by the replication method defined for a given table. For tables that use our key-based incremental replication method we can make changes in the destination based on the structure changes. We will append new columns, split the destination columns to accommodate new data types, or no longer load data to a column if it has been removed from the source, as explained in our data loading guide for each destination.
We don't currently support structural changes to tables that use our log-based incremental replication method. We use JSON Schema validation during extraction to make sure our customers' data is always loaded accurately, and several database binary logs don't include columnar information in the log files we read from. A change in the schema for one of these tables generally leads to critical errors during extraction, and requires a full re-replication of the source table before replication for that integration can proceed. This limitation of log-based replication is explained in more detail in our documentation.
Q: What's the difference between your Certified and Community integrations?
Both Certified and Community integrations offer a number of benefits:
- Running at scale on a reliable infrastructure
- Secure credential management
- Setup and configuration via the Stitch interface
- Configurable scheduling
- Automated notifications when something goes wrong
- Log exploration interface
- Access to all Stitch-supported destinations
The key difference is that Stitch provides commercial support for Certified integrations but not for Community integrations – though support for Community integrations can be included in Enterprise contracts. Commercial support is a guarantee that the Stitch team will fix bugs and adapt to new versions of third-party APIs. Maintenance of Community integrations is handled by members of the Singer open source community.
Q: What do I need to do to get started using Stitch?
Sign up, add a data source, a destination, and you’re ready to go. We offer an unlimited 14-day trial, so feel free to connect all of your systems to Stitch. Our Getting Started guide walks you through the process, and our Support team can help out with any bumps you hit along the way.