Tapping into the Marvel API

Singer is all about helping you work with data more easily – from any data source to any destination. Want to get API data into a CSV file? Cool. A SaaS data set into a Google Sheet? No problem.

Singer is an open source standard for writing scripts that move data, so you can write what Singer calls a tap to extract data from any data source, and send it to any destination, or target.

To illustrate this point, let's create a tap and load it into a CSV file – and for illustration purposes let's use the Marvel API.

Marvel documentation

For this example we’ll call character endpoints and organize everything by name, per the interactive Marvel Swagger documentation.

If you’d like to play along, you’ll need your own Marvel API keys, which you can get by registering on the site.

Introducing Singer

Singer

On the Singer side, check out the Singer getting started documentation – notably the part about developing a Python tap:

Singer documentation

Singer needs a schema, which the singer-python library uses to output data in the right format with the write_schema command. For write_schema, Singer uses JSON schema to format all the data. Singer then uses the write_records command to say, “Now that we know how we want our data to look and where it should go, we’re going to send it where you want.”

The actual tap

Let’s walk through the marvel.py file, which you can find in the Marvel tap on GitHub.

Marvel tap

After several lines of imports, the code moves into:

  1. Getting our API keys introduced using a config file (check out config_example.json)

  2. Allowing for correct formatting in the command line. This is important, because to run Singer we need to specify where the config file is and the type of output we want. We’ll note this at the end:

More code

Next comes the Marvel API part of the script. Marvel's documentation explains the required parameters: a timestamp and a hash that holds our API keys:

Documentation

There are also limits on calls, which we define in a variable:

More code

Next, we set up our JSON schema:

JSON schema code

Now we're ready to call the API. The Marvel API puts a limit on the number of fields we get back, so we’ll use a while loop to keep asking the API for more rows and dumping them into the CSV file until there aren’t any more. The loop first calls the Marvel API with all the parameters it asks for, then writes a record from the results the API call returns:

More code

Finally, we tell Python to run our main function:

More code

To invoke the script, head over to terminal and run:

python3 marvel.py -c config.json | target-csv

Voilà! You should now have character data in a characters.csv file:

CSV file

I found links to images of characters I have never in my life seen:

Somebody’s not happySomebody’s not happy

Whoa! Who is this guy and why does he not have his own comic?Who is this guy and why does he not have his own comic?

Here’s a quick data visualization we made from the data using Raw. This circle pack shows the name of a comic mapped to the number of comics available. The popularity of Wolverine isn’t surprising, but I’ve never heard of Squadron Sinister or Strong Guy, yet there’s a number of comics for them:

Comic titles in the Marvel universeComic titles in the Marvel universe

Here's another visualization – a tree map of the first 50 rows that maps the name of the comic to the number of issues in the collection.

Tree map

In summary, by writing some simple Singer code we were able to create a treasure trove of Marvel nerdiness in an easily consumable CSV ready to be imported, analyzed, or whatever else your heart might desire.

Sign up for the Singer release notes for news about Singer updates, or join the Singer Slack workspace.