Tutorial: Using Metabase with a data warehouse for analytics

How do you begin combining data from cloud applications with your internal databases to gain insight into your business? Maybe your organization has already standardized on Metabase as your analytics tool, but you're still learning about using it with multiple data sources.

To analyze data from diverse sources, you need a data warehouse that consolidates all of your data in a single location. Most businesses take advantage of cloud data warehouses such as Amazon Redshift, Google BigQuery, or Snowflake.

You can extract data that you have stored in SaaS applications and databases and load it into the data warehouse using an ETL (extract, transform, load) tool. Once the data is available, your analysts can use it to create reports. In this post, we'll look at how to start from scratch and create a report using Metabase, an open source business intelligence (BI) tool that's free to download and use.

Metabase's visual query builder lets you generate simple charts and dashboards, or you can use SQL to create more complex visualizations, as we'll do here. Each query starts by clicking a button to "ask a question." Metabase is simpler than tools like Tableau and Power BI, offering fewer features, but it's correspondingly simpler to learn.

Reporting, data warehouses, and ETL

Metabase can run in the environment of your choice via a Docker image, on AWS Elastic Beanstalk, or on Heroku, or you can run it as a native application on macOS or as a Java jar file. I used macOS version 0.31.1.

Per its FAQ, "Metabase is primarily meant to work with actual databases." If you want to analyze data in SaaS platforms, the developers "suggest that you use other tools to build a data warehouse with the data you need."

That's what we'll do with Stitch, a simple, powerful ETL service for businesses of all sizes, up to and including the enterprise. It can move data from dozens of data sources. Sign up and you can be moving data to a data warehouse in five minutes. Choose at least one of Stitch's integrations as a data source. Most businesses get more value out of their data as they integrate more data sources; for our example, we'll look at data from two separate sources.

Setting up a data warehouse

I used some of Stitch's real data to build a data visualization for this post. Specifically, I was curious to see how many support conversations came from each of Stitch's pricing tiers. To find out, I'd need to use billing data stored in Salesforce and join it with support conversation data from Intercom to create a visualization of support conversations.

My first step was to set up a BigQuery data warehouse to store the data from the two SaaS platforms. Following our documentation, I added to my Google user the permissions that Stitch would need to load data to BigQuery.

Using Stitch makes extracting data from a source and loading it into a data warehouse easy. I logged in to Stitch and added new integrations for Salesforce and Intercom, following the instructions in our documentation. From Salesforce I selected the Account table to replicate. Stitch's Intercom integration automatically extracts data from all available tables. From both, I chose the period from which to replicate data.

Add-Integration

Once I'd set up my integrations I added BigQuery as my destination and specified a BigQuery project name. Within a few minutes, Stitch extracted the data I specified and loaded it into BigQuery.

Add-Destination

Building a sample report

Next I turned to Metabase.

To start, I clicked on Metabase's settings icon, then Admin. I chose Add a Database and entered the names of my BigQuery project ID and other information, which I obtained from the BigQuery dashboard. As part of that process I had to generate an OAuth 2.0 Client ID and Client Secret — a simple process that involved entering some simple information on a screen connected to a Click here link. Similarly, I generated an Auth Code by providing my Google authentication information.

MB-ss-1

Once you have a database to work with, you can "ask a question" in one of three ways:

MB-ss-2

If you choose Metrics, Metabase directs you to create segments and metrics from its admin panel. You can choose Custom for slightly more complex queries on a single table. I wanted to join data from two tables, but a Metabase blog post says:

Our goal with Metabase has always been to provide a way for non-technical users to answer their own questions in a self-serve manner. While joins are a great tool that a skilled analyst or programmer might reach for, we will be trying to add features that expose a highly specific, easily understood operation that someone who isn't SQL fluent would understand.

Fortunately, we're SQL-fluent around here. I chose Native Query, which lets you paste in SQL code.

A look at the Intercom schema showed me that to associate the number of conversations with the Stitch customer ID, I'd need to join the Intercom Users and Conversations tables. The Users table contains a field called companies that contains information about the companies a given user is associated with. That field is a list that could contain multiple values – a nested data structure, in other words. Many data warehouses don't support nested data structures, but BigQuery does. It stores each list in an array.

Retrieving data from a nested data structure is tricky if you're used to working with fully normalized data. My colleague Erin, Stitch's senior technical documentation manager, set me on the right path with this query:

SELECT
  users.id AS user_id,
  company.id AS company_id,
  company.company_id AS company
FROM
  `reporting01-216119.intercomintegration.users` AS users,
  UNNEST(companies.companies) AS company
WHERE
  company.company_id IS NOT NULL

The SQL UNNEST operator takes an array and returns a table, with one row for each element in the array.

To get the plan tier information from our Salesforce Account table, I created another query:

SELECT
  CAST(cid__c AS STRING) AS stitchid,
  tier__c
FROM
  `reporting01-216119.salesforceintegration.Account` AS account
WHERE
  account.isdeleted IS FALSE
  AND tier__c IS NOT NULL

I used those those queries, along with data from Intercom's Conversations table, to retrieve the data I wanted, joining Intercom and Salesforce data. I used a case clause to consolidate data from monthly and annual billing plan into a single bar for each pricing tier:

WITH
  userco AS (
  SELECT
    users.id AS user_id,
    company.id AS company_id,
    company.company_id AS stitch_id
  FROM
    `reporting01-216119.intercomintegration.users` AS users,
    UNNEST(companies.companies) AS company
  WHERE
    company.company_id IS NOT NULL),

  sf AS (
  SELECT
    CAST(cid__c AS STRING) AS stitchid,
    case
      when tier__c LIKE '%Starter%' then 'Starter'
      when tier__c LIKE '%Basic%' then 'Basic'
      when tier__c LIKE '%Premier%' then 'Premier'
      when tier__c LIKE '%Custom%' then 'Enterprise'
      else tier__c
      end as tier
  FROM
    `reporting01-216119.salesforceintegration.Account` AS account
  WHERE
    account.isdeleted IS FALSE
    AND tier__c IS NOT NULL)

SELECT
  conversations.id AS convo_id,
  count(conversations.id) AS convo_count,
  conversations.user.id AS convo_user_id,
  userco.user_id,
  userco.company_id,
  userco.stitch_id,
  tier
FROM
  `reporting01-216119.intercomintegration.conversations` AS conversations
JOIN
  userco
ON
  conversations.user.id = userco.user_id
JOIN
  sf
ON
  stitchid = stitch_id
WHERE
  conversations.user.type != "lead"
GROUP BY tier, convo_id, convo_user_id, user_id, company_id, stitch_id
ORDER BY convo_count

Running this query in the BigData console gave me the data I wanted. Pasting it into Metabase gave me a table of data. To turn it into a visualization, I selected Bar from the Visualization drop-down and chose tier as the X axis. The result was a useful visualization.

MB-ss-3

Metabase lets you customize bar colors and axis labels. When you're satisfied with your visualization, you can save it in a collection and publish it to a dashboard that other users can share.

So there you have it – a quick walk through the process of using an ETL tool like Stitch to move data from multiple sources into a data warehouse, then report on it using Metabase. Sign up for a free trial of Stitch and start creating your own.