Bytecode delivers custom integrations with Singer and Stitch

The Guild, an Austin, Texas-based travel company that curates a collection of boutique hotels located in upscale residential buildings, is a data-driven business. They maintain a Google BigQuery data warehouse, with a Stitch ETL pipeline that lets the company replicate data from SaaS platforms such as Google Ads and Delighted and databases such as MySQL. But The Guild wanted to add data from three additional platforms to its data warehouse — platforms that Stitch had no native integration for.

The company turned to Stitch implementation partner Bytecode IO for help. Bytecode offers consulting and development services around business intelligence, cloud architecture, data integration, and data warehousing. They specialize in setting up organizations' data stacks. For The Guild, Bytecode built three Singer taps that allow the company to pipe data from those platforms — Revinate, Front, and Typeform — through Stitch and into their data warehouse.

Revinate is a hotel CRM and email marketing platform. Front provides a shared inbox for teams and a single collaborative workspace across email, social media, chat, and SMS. Typeform specializes in online form-building and online surveys. All three provided valuable market data that The Guild wanted to use for business intelligence (BI).

Singer is an open source ETL framework that Stitch created to enable anyone, including Stitch customers, partners, and the open source community, to build new integrations that can be run on Stitch or on their own hardware. Singer provides a standard for writing scripts that move data using JSON, which allows businesses to quickly build integrations for new platforms, and also reuse integrations that other organizations have written. The community has written more than 60 integrations.

Building Singer taps

Bytecode senior data engineers Jeff Huth and Mike Taluc built the three taps for The Guild. When they started the project they were familiar with Singer but new to developing taps.

"There was definitely a learning curve involved," Huth says. "But the fact that Singer is open source allowed us to leverage what had been done already and got us up to speed quickly."

To get started, the developers cloned existing taps' repositories and studied them. Taluc says, "Every data source's API has subtle differences in the way the data is returned, the way it does paging, the way authorization happens. The development process started with us getting on the command line, running some curl/REST API calls, and seeing how the data is returned. Once we had that it was fairly easy to put it into Python calls. [You can use any language to write a Singer tap, though most taps are written in Python.] The hardest part was figuring out schema parsing. You have to format the data in a way your target can read easily."

Huth had a similar experience. "The process was first understanding the source API: figuring out what endpoints and API calls to focus effort around and how to authenticate. Then we had to figure out how to format API call returns into a JSON structure, and figure out the best way to represent that in the database: which fields do I need, how do I transform to get them to look like what I want in the data warehouse? What are the primary keys for each object? How does the API return records, and how do I loop through them?"


Following open source best practices

Huth says they built each tap object by object in Python. Though the code for the taps they wrote was for a specific client, Taluc says the process was "sort of like writing taps for the public, since the code is open source. We made sure to add good comments to the code, and to make each module reasonably generic, so it can be usable by a larger audience than just the client we're working for today."

"You've got to write the code so that if someone later wants to write in another API call, they can do that," Huth says. "There's a balance between making it specific and making it generic."

Along the way, the Bytecode developers got help from from both Stitch developers and the Singer community. "The Singer Slack channel was really useful," Huth says. "We would usually get responses within an hour or two, which is pretty fast."

Eventually each tap was committed to the Singer GitHub repository. A Stitch engineer did a code review and provided recommendations.

"That was an iterative process," Huth says. "We'd find issues, fix them, retest, and verify. One of the Singer tools, the singer-check-tap utility, is really good, but there were some issues that even it didn't catch. But we had really fast turnaround time to iterate on the Stitch platform."

All three taps are currently in private beta while Bytecode works with The Guild on acceptance testing. Once everyone is satisfied that everything is running successfully, the Stitch team will work with Bytecode to prep the taps for wider release. Eventually they'll be available to download from the Singer repository for anyone who might want to extract data from those sources, and they'll become community integrations within Stitch.

The Bytecode developers found the Singer/Stitch platform to be a valuable solution. Huth says, "Stitch lets you see things like what jobs are running and how many records moved, so you have some visibility if you run into any problems. It's a good platform for managing multiple integrations. It offers lots of open source tooling. And it's fairly easy to develop new taps.

"If you can homogenize your ETL tech, that's a pretty big win. It's a really elegant pipeline model."

Image credit: Earl McGehee