How to connect a Singer tap with Stitch

We think it’s critical that ETL be extensible to support any data source. That's why we created the open source Singer project. Singer’s goal is to enable anyone — our team, our customers, our partners, and the open source community — to build and use integrations, regardless of whether they're a Stitch customer.

Many organizations use popular SaaS platforms such as Google Analytics and Google Ads, Salesforce, Marketo, Shopify, and Stripe, which are supported by Stitch and by other ETL tools. But businesses may also use applications that are less popular but still essential for their business, and for which there are no integrations available. We made Singer to provide a path to integrate with any application that’s not beholden to anyone’s roadmap.

We also made Singer open source to let organizations share their work, and leverage the work of others. There's no point in reinventing the wheel when you can use or adapt an integration someone else has already created. And once you share you work, other developers may contribute code to enhance your Singer integration. The Singer community has already built dozens of integrations.

If you build a Singer integration, you can make use of it three ways.

Run Singer entirely on your own hardware

Singer taps (which pull data from sources) and targets (which send data to destinations) are self-contained applications. You can run them in your own environment using the scheduling and orchestration solution of your choice. We wrote a tutorial on using Singer with Airflow. Other options include using Luigi, cron, and custom-built schedulers.

An advantage of this approach is that you have total control of your data pipeline. If you have constraints that don't let you run your data through a cloud pipeline, or you don't want to open source any code, this is the way to go — but there are drawbacks.

Singer taps and targets will extract and load your data, but there's more to a data pipeline than that, and if you run the Singer applications on your own hardware, you'll have to write supporting infrastructure code yourself. We explain in another blog post why most people shouldn't build their own data pipeline.

Submit your tap to Stitch

Another approach, and the one most organizations take, is to have a Singer tap built into Stitch. This is the easiest option for the long run, because it requires the least ongoing work for you. Once you've written and tested your code, share it with the community. Stitch developers will review your code, collaborate with you to test the integration, and list your tap or target on the Singer site. We'll also integrate a new tap into Stitch, build documentation for it, and release it for others to use.

This approach gives you all of the advantages of being part of Stitch. Our pipeline is fast, highly reliable, and certified for HIPAA and SOC 2 security and privacy. We provide logging, monitoring, alerting when problems crop up, credential management, and autoscaling infrastructure.

Hybrid — run Singer locally and send to Stitch

You can also run taps on your own hardware and then send the data to the Stitch API, taking advantage of Stitch's features for the load phase. Stitch will make sure your data gets to your destination, whether that's a cloud data warehouse like Amazon Redshift, Google BigQuery, Snowflake, or Microsoft Azure SQL Data Warehouse, or a database like PostgreSQL. This is a good option for organizations that need to ETL data from a new tap right away, before we can certify their code and integrate it with Stitch, or ones that don't want to release the code for a proprietary data source under an open source license, but want to load to a Stitch-supported destination.

To do this, start by following the README instructions to install the Stitch target. If you're not already a Stitch user, sign up (it's free). Make note of your client ID — the six-digit number that appears in the URL after https://app.stitchdata.com/client/ — because you'll need to paste it into a config file.

You'll also need an API access token, which you generate by adding Stitch's Import API as a data source. Within Stitch, click on Add Integration, choose Import API, and follow the instructions in our documentation. Copy the token that Stitch generates.

Use a text editor and create a new file that looks like this:

{
  "client_id" : clientid,
  "token" : "token"
}

Paste in both the token and the client ID and save the file as config.json.

Finally, in the directory in which you saved the config file, run the command

$ _yourtap_ | target-stitch --config config.json

Use the name of your tap in place of yourtap, of course.

If all goes as expected, your tap will begin extracting data and sending it to Stitch, which will replicate it to the destination you've specified. If you need support along the way, join the Singer Slack community.

Get started with Singer

Stitch is adding integrations all of the time, but if you have a custom data source that we don't support, follow the getting started documentation and try writing your own tap using Singer.

Image credit: darf_nicht_mehr_hochladen