What can you use a data pipeline for? Turns out, quite a lot

Before we set out to build the world’s best data pipeline, we needed to know whether the need was really there. Do companies really need data pipelines? Why? What do they do with them? What kind of companies feel this need? How big are they? What industry are they in?

To answer these questions, we conducted hundreds of interviews with people who use data as a core part of their jobs. We were struck by the diversity of what we found. We spoke to companies from large to small, in a wide variety of verticals, with wide ranging use cases. I wanted to share a few of these data points because much of what I see written about data infrastructure focuses on the how, not the why. And the why is important too.

Here is a sampling of the companies we spoke with:

  1. Public company with a 20-person data science team
    One of their main internal priorities is aggregating data on product usage. They need this both so they can make data-driven product improvements, and so that they can provide reporting on usage back to their customers. The data team also supports the marketing and finance teams with data for forecasting purposes.

  2. Ecommerce platform that powers thousands of merchants
    The merchants who use this platform get basic reporting functionality bundled with the platform, but it’s limited to transactional data. Instead, they wanted to see the whole picture of their business. If the ecommerce platform wanted to be the single source of truth, it needed to consolidate data from the other tools the merchants use.

  3. Social media startup with over 100 employees
    They are implementing a new BI tool that does not support a secure connection to their internal database, but it does natively integrate with Redshift. This company also wants to consolidate data from their SaaS tools to facilitate direct query access and access through the BI tool’s dashboards.

  4. Two AI startups
    One powers web personalization, and the other dynamically manages ad bids and placements. Both companies consider their algorithms to be their competitive edge, but only run those algorithms on a single data source. They want to train their models on additional data sources, but they know that data aggregation is outside of their core competency.

  5. SaaS company with millions of dollars of revenue
    Their data is spread across their internal application database, ad networks, web traffic analytics, marketing automation, CRM, and more. The most important analytics priority for them is to to understand their full lifecycle ROI, which requires data from across their funnel and all customer touchpoints.

  6. Crowdsourcing marketplace with tens of millions in annual revenue
    This tech team dramatically improved the performance of their site and application by introducing a sharded database architecture. The downside of this approach is that analytical queries became far more difficult even if the question they were trying to answer was a simple one. They needed to consolidate their dozens of discreet application databases into a single data warehouse in order to run any coherent analyses.

  7. On-demand worker startup growing 20% month over month
    Their data comes from a dispatch system, mobile app, internal database, and CRM. Speed is their top priority, and they need to quickly incorporate new data sources to enable their team to make decisions.

  8. Public apparel company that sells through physical retail, online, and wholesale
    They have separate ecommerce shopping carts across each brand, an in store point of sale system, email, online advertising, and web traffic data. They have invested millions of dollars into a relationship with a consulting firm that cobbled together legacy ETL and data warehousing tools along with custom services. Changes to their data model or processing workflow require weeks of planning and lead time to implement.

  9. Marketing agency that manages ad spend for customers
    They test marketing strategies on behalf of customers across many different channels including Adwords, Facebook, LinkedIn, Twitter, Bing, and more. They need to report on acquisition efficiency and payback periods for their customers. In addition to data from the ad networks, they also need information from their customers’ CRM, in-house databases, and shopping carts in order to understand and report on the success of their ads.

Virtually every company with whom we spoke had a need to improve the way that they consolidate their data. While Stitch may not be a perfect fit for all of them, the need we’ve seen over and over again is the same: combine data from disparate sources into a single high-performance data store, at unlimited volume and with low latency. And that’s exactly what we built Stitch to do.

I’m excited to help these companies get more out of their data. If you’d like to explore how Stitch can help you, sign up now and get 5 million events a month free, forever.