Singer is all about helping you work with data more easily – from any data source to any destination. Want to get API data into a CSV file? Cool. A SaaS data set into a Google Sheet? No problem.
Singer is an open source standard for writing scripts that move data, so you can write what Singer calls a tap to extract data from any data source, and send it to any destination, or target.
To illustrate this point, let's create a tap and load it into a CSV file – and for illustration purposes let's use the Marvel API.
For this example we’ll call character endpoints and organize everything by name, per the interactive Marvel Swagger documentation.
If you’d like to play along, you’ll need your own Marvel API keys, which you can get by registering on the site.
On the Singer side, check out the Singer getting started documentation – notably the part about developing a Python tap:
Singer needs a schema, which the singer-python library uses to output data in the right format with the
write_schema command. For write_schema, Singer uses JSON schema to format all the data. Singer then uses the
write_records command to say, “Now that we know how we want our data to look and where it should go, we’re going to send it where you want.”
The actual tap
Let’s walk through the marvel.py file, which you can find in the Marvel tap on GitHub.
After several lines of imports, the code moves into:
Getting our API keys introduced using a config file (check out config_example.json)
Allowing for correct formatting in the command line. This is important, because to run Singer we need to specify where the config file is and the type of output we want. We’ll note this at the end:
Next comes the Marvel API part of the script. Marvel's documentation explains the required parameters: a timestamp and a hash that holds our API keys:
There are also limits on calls, which we define in a variable:
Next, we set up our JSON schema:
Now we're ready to call the API. The Marvel API puts a limit on the number of fields we get back, so we’ll use a
while loop to keep asking the API for more rows and dumping them into the CSV file until there aren’t any more. The loop first calls the Marvel API with all the parameters it asks for, then writes a record from the results the API call returns:
Finally, we tell Python to run our main function:
To invoke the script, head over to terminal and run:
python3 marvel.py -c config.json | target-csv
Voilà! You should now have character data in a characters.csv file:
I found links to images of characters I have never in my life seen:
Somebody’s not happy
Who is this guy and why does he not have his own comic?
Here’s a quick data visualization we made from the data using Raw. This circle pack shows the name of a comic mapped to the number of comics available. The popularity of Wolverine isn’t surprising, but I’ve never heard of Squadron Sinister or Strong Guy, yet there’s a number of comics for them:
Comic titles in the Marvel universe
Here's another visualization – a tree map of the first 50 rows that maps the name of the comic to the number of issues in the collection.
In summary, by writing some simple Singer code we were able to create a treasure trove of Marvel nerdiness in an easily consumable CSV ready to be imported, analyzed, or whatever else your heart might desire.