Why our ETL tool doesn’t do transformations

A lot of people ask us why Stitch doesn’t do transformations.

This is a reasonable question; we advertise Stitch as an ETL (extract, transform, load) tool, and some of these tools have extensive transformation functionality. To directly answer the question: We do transformations, just not arbitrary transformations.

Our goal is to get your data from source to destination in a useful, raw format. Useful means with types and structures that make the data easy to work with, and raw means staying as close to the original representation as possible. We do the transformations required to get this done — like translating one database’s types into another’s, and breaking nested structures into relations.

But that’s where we stop. We stop because once you have raw data, what you do with it depends on your needs. With raw data, you have the foundation of a robust data infrastructure. You have the input to any transformed data set, and an audit trail to figure out how a non-nullable field became null, or why your weekly flash report is showing more leads than your lead attribution model is counting. You also have the means to fix those issues retroactively.

We advocate adding transformations on top of raw data in layers. Lower layers apply to all use cases – like cleaning out test data or normalizing certain values. Higher layers are less general – for example, a view of the data for your marketing team that filters out sensitive customer information. The top transformation layers are application-specific, and this is where most of the interesting work is done. A reporting tool can do daily or hourly rollups, while a more advanced model like a sessionizer can store intermediate values like session tokens.

On our own data, we prefer to keep transformation logic in SQL using a transformation tool called dbt, since our team is fluent in it, and data warehouses like Amazon Redshift and Google BigQuery can perform transformations at blazing speeds. But there are plenty of other tools out there for building data transformations, from general-purpose frameworks like Apache Spark to cloud-based services like Google Cloud Dataflow to GUI-based tools like Kettle.

So should Stitch be called an ETL tool?

Yes. It extracts data, transforms it, and loads it into a database or data warehouse. I’m sure that we’ll continue to get questions about why we don’t support arbitrary transformations, but that’s OK. We’re trying to enable a different, better workflow. It’s not for everyone, but if you think it might be for you, give it a try for free.