Data engineers can relax with Stitch ETL
"Headspace is a merger of two companies," says Director of Engineering Josh Carver. "On one side we had Ginger and then on the other side, we had Headspace. We're a broad-spectrum health care provider. We offer everything from mental wellness through guided meditation in our mobile app, all the way through therapy and psychiatry."
"Stitch has been in use at Ginger for years and years and years, and predates me joining Ginger," Carver explains. Things were a lot different when he started at the company in 2019. Like many high-growth business businesses, Ginger grew from a lean and flexible operation. "When I joined, I was not running data engineering — there wasn't a data engineering team. I was doing completely different stuff."
Carver's role has changed completely since Ginger merged with Headspace in 2021. He is now responsible for about a dozen data engineers in three groups. The Data Engineering team — which manages Stitch and an Amazon Redshift data warehouse — focuses on platform and infrastructure. Product Engineering creates data products for internal business analysts, machine learning engineers, and data scientists to consume. Finally, Analytics and Business Intelligence is a machine learning and data science team.
"We offer text-based messaging between coaches and members, and so data from those services is replicated via Stitch into our Redshift cluster," he says, "and then Looker is currently pointed at Redshift."
"In addition to that, we do send a lot of mobile events into Amplitude. We connect directly to API service databases that that we own. There are primary services that have RDS Amazon databases that we connect to, and we replicate records from several tables into Redshift."
"We do connect to Salesforce and then pull that into Redshift. And then a few other smaller things, like there's a couple of Google Sheets that are manually maintained," Carver notes. "We pull that into Redshift as well via Stitch."
"What we're doing right now is trying to unify tool sets. We started with Amazon accounts, and then we're working through what kind of tech stack we want for the time being. No plans to remove Stitch or anything like that! The use case has remained stable over the past several years."
Carver reports that Stitch keeps serving its purpose through all this growth. "Stitch is a pretty easy point-and-click solution to replicate data into our data warehouse. And it's saved us a lot of time in terms of setup," he says. "Our primary use case is replicating database rows into Redshift. Right now, that's probably 80 or 90% of what we do with Stitch."
Stitch is a pretty easy point-and-click solution to replicate data into our data warehouse. And it's saved us a lot of time in terms of setup.
Director of Engineering
"If we were to turn Stitch off, that means all those dashboards would stop updating. It does have a really big impact on the business's ability to make decisions pertaining to the care side of our business."
Carver's data science team also needs mountains of raw data for machine learning projects. "Our primary storage format is an open-source storage format by Databricks called Delta Lake. And then we do a lot of SageMaker model training. We use PagerDuty for learning, Sentry for exception tracking, and then we use a lot of [Google] Cloud Launcher and Prometheus metrics depending on which side of the stack you're on."
"If you look at alternatives," he explains, "you can snapshot an RDS database using Amazon Lambda function once a day and kind of load that into Redshift, but it's considerably more work for an engineer to build that and maintain it. It's worked out to our benefit to pay for Stitch to do it for us. We're probably saving a couple of sprints — a quarter of an engineer's time — to build and maintain those critical pipelines ourselves."
Stitch is powering all the data that shows up in executive dashboards pertaining to care and clinical operations metrics.
Director of Engineering