Our customers have been asking us for more insight into the Stitch data pipeline’s extraction and load process, and for the last few months we’ve been working to deliver those capabilities. In November we announced the availability of extraction logs for several of our most popular integrations. Today we’re unveiling a complementary set of information from the load side.
Heretofore, for a replication job for an integration, we displayed a single number of rows replicated, no matter how many tables were processed. That wasn’t sufficient granularity for customers, especially since we charge based on the number of rows they replicate; it made it difficult for users to audit their Stitch usage. One person compared it to taking your car to a mechanic and being told, “We fixed seven things, and it’s going to cost you $1,000,” but having no line-item details to explain how they arrived at that number.
Now Stitch users can see their row counts broken down by table, to help identify where all the rows they’re loading are coming from, show which tables are sending rows to the data warehouse, and indicate whether a table is currently loading data or ended in error. This information is of particular interest when a database or SaaS data source contains nested data, as we de-nest that data when loading to Redshift and Postgres destinations. We now show the breakdown of all the subtables we create and the rows associated with them.
In addition to seeing table-by-table row counts, Stitch users can now also drill down to view historical load information for each table, including information on when each load occurred and the rows loaded. We store that information for the same amount of time that we retain extraction logs: 24 hours for users on our Free plan, 7 days for those on our paid plans, and 60 days for Enterprise customers.
But wait, there’s more! For some of our most popular integrations, including MySQL, Google Adwords, Facebook Ads, Salesforce and HubSpot, we now also provide the maximum value of each table’s replication key — the most recently synchronized data point — and the time when that data was extracted from the source. You can use this information to spot-check how fresh data is in your data warehouse. You can now forgo manually inspecting the data in your data warehouse to infer when it was loaded. Instead, you can easily see how the data being loaded into your data warehouse is changing over time.
To get more information about new features in Stitch as we roll them out, subscribe to our changelog.