Mixpanel integration summary

Stitch’s Mixpanel integration replicates data using the Mixpanel Event Export API & Mixpanel Query API.. Refer to the Schema section for a list of objects available for replication.

Mixpanel feature snapshot

A high-level look at Stitch's Mixpanel (v1) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status

Beta

Supported by

Stitch

Stitch plan

Free

API availability

Available

Singer GitHub repository

singer-io/tap-mixpanel

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Supported

Table-level reset

Unsupported

Configurable Replication Methods

Unsupported

DATA SELECTION
Table selection

Supported

Column selection

Supported

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting Mixpanel

Mixpanel setup requirements

To set up Mixpanel in Stitch, you need:

  • Admin privileges. Your role in your Mixpanel account must be admin in order to be able to retrieve your API secret.

Step 1: Retrieve your Mixpanel project timezone and API secret

  1. Login to your Mixpanel account.
  2. In the dropdown menu in the upper left corner of the page, select the project you want to replicate data from.
  3. Hover over the Settings icon in the upper right corner. In the PROJECT SETTINGS portion of the dropdown menu, click on the link with the name of your project.
  4. Copy the Project Timezone and API Secret, and paste those values someplace safe to use for the next step.

Step 2: Add Mixpanel as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the Mixpanel icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch Mixpanel” would create a schema called stitch_mixpanel in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the API Secret field, paste the API Secret you retrieved from Step 1.
  6. In the Attribution Window field, enter the number of days you want your tables’ attribution window to be. For more information on attribution windows, refer to the Replication section.
  7. In the Date Window Size field, enter the number of days desired for a date looping window for the exports, funnels, and revenues tables.

    Date looping will return records whose from_date and to_date fall between the number of days in the defined window size.

    Note: If your project has large volumes of events, you may want to set the number of days to 14, 7, or even to 1 or 2 days.

  8. In the Project Timezone field, paste the Project Timezone you retrieved from Step 1.
  9. Optional: In the Select Properties By Default, enter true to capture new properties in the events and engage tables’ records. If set to false or left blank, new properties will be ignored.

Step 3:

The Sync Historical Data setting defines the starting date for your Mixpanel integration. This means that:

  • For tables using Key-based Incremental Replication, data equal to or newer than this date will be replicated to your destination.
  • For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your destination.

For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Note: Mixpanel limits the number of days historical data may be accessed, depending on your Mixpanel plan. If you select a Start Date greater than what your Mixpanel account has access to, Stitch may encounter issues with Mixpanel’s API.

For example: If you have a Starter Free Mixpanel plan, you have access to 90 days of historical data (as of 06/5/2020). If you select a Start Date greater than 90 days, Mixpanel’s API may return an error.

Additionally, the Start Date must be less than or equal to 365 days. If a Start Date greater than 365 days is selected, Stitch will reset the Start Date to 365 days during Extraction.

Refer to Mixpanel’s documentation for more info and to check your Mixpanel account’s historical data access limit.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Mixpanel integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 5: Set tables and columns to replicate

To complete the setup, you’ll need to select the tables and columns you want to replicate to your destination.

Check out the Schema section to learn more about the available tables in Mixpanel and how they replicate.

  1. In the list of tables that displays - or in the Tables to Replicate tab, if you skipped this step during setup - locate a table you want to replicate.
  2. To track a table, click the checkbox next to the table’s name. A green checkmark means the table is set to replicate.

  3. To track a column, click the checkbox next to the column’s name. A green checkmark means the column is set to replicate.

  4. Repeat this process for all the tables and columns you want to replicate.
  5. When finished, click the Finalize Your Selections button at the bottom of the screen to save your selections.

Note: If you change these settings while a replication job is still in progress, they will not be used until the next job starts.

Initial and historical replication jobs

After you finish setting up Mixpanel, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


Mixpanel replication

Attribution windows and data extraction

When Stitch runs a replication job for Mixpanel, it will use the value of the Attribution Window setting to query for and extract data for tables using Key-based Incremental Replication. An attribution window is a period of time for attributing results to ads and the lookback period after those actions occur during which ad results are counted.

For example: If set to 5 days, Stitch will replicate the past five days’ worth of data for applicable tables every time a replication job runs. While Stitch replicates data in this way to account for updates to records made during the attribution window, it can have a substantial impact on your overall row usage.

In the sections below are examples of how attribution windows impact how Stitch extracts data during historical and ongoing replication jobs.

For historical and full re-replications of Mixpanel data, Stitch will query for and extract data newer than or equal to the date defined in the Start Date field in the Integration Settings page.

The Start Date, in conjunction with the Attribution Window, defines the minimum date Stitch should query for when extracting historical data. This is calculated as:

Start Date - Attribution Window = Minimum Extraction Date

Example

During the initial set up, the Attribution Window and Start Date settings are defined as:

  • Attribution Window: 5 days (default setting)
  • Start Date: 07/03/2017, or 2017-07-03 00:00:00

To account for the Attribution Window, Stitch would calculate the Minimum Extraction Date value as: 2017-07-03 00:00:00 - 5 days = 2017-06-28 00:00:00

If you were to write a SQL query using this date for the revenue table, it might look like this:

  SELECT *
    FROM mixpanel.revenue
   WHERE date >= '2017-06-28 00:00:00'   /* Min. Extraction Date */
ORDER BY date

For ongoing replication jobs, Stitch will query for and extract data using the last saved maximum value in the table’s Replication Key column and the Attribution Window for the table.

Note: This applies to every replication job that takes place after the historical replication job.

Example

The last maximum saved Replication Key value for the revenue table is 2017-10-01 00:00:00.

To account for the Attribution Window of 5 days, we’d subtract this from the last maximum saved Replication Key value:

2017-10-01 00:00:00 - 5 days = 2017-09-26 00:00:00

In this case, Stitch would query for and extract data that is newer than or equal to 2017-09-26 00:00:00 and older than or equal to 2017-10-01 00:00:00.

If this were a SQL query, it might look like this:

  SELECT *
    FROM revenue
   WHERE date >= '2017-09-26 00:00:00'
                              /* max Replication Key value - Attribution Window */
     AND date <= '2017-10-01 00:00:00'
                              /* max Replication Key value from previous job */
ORDER BY date

Attribution windows and row count impact

Due to the Attribution Window, a high Replication Frequency may not be necessary. Because Stitch will replicate data from the past N days during every replication job, recent data will be re-replicated and count towards your row quota.

To reduce your row usage and replicating redundant data, consider setting the integration to replicate less frequently. For example: every 12 or 24 hours.


Mixpanel table reference

Replication Method :

Full Table

Primary Key :

date

API endpoint :

Get annotations

The annotations table contains info about annotations.

date
DATE-TIME

The date the annotation occurred.

description
STRING

The description of the annotation.

id
INTEGER

The annotation ID.

project_id
INTEGER

The ID of the project associated with the annotation.

Reference:


Replication Method :

Full Table

Primary Key :

cohort_id : distinct_id

API endpoint :

Engage, fiiter by cohort ID

The cohort_members table contains info about the cohorts user profiles belong to.

cohort_id
INTEGER

The cohort ID.

Reference:

distinct_id
STRING

The individual profile ID.


Replication Method :

Full Table

Primary Key :

id

API endpoint :

List cohorts

The cohorts table contains info about the cohorts in a Mixpanel project.

id
INTEGER

The cohort ID.

Reference:

count
INTEGER

The number of users in the cohort.

created
DATE-TIME

The time the cohort was created, in yyy-mm-dd hh:mm:ss format.

description
STRING

The description of the cohort.

is_visible
INTEGER

If 1, the cohort is visible. If 0, the cohort is hidden.

name
STRING

The name of the cohort.

project_id
INTEGER

The project ID.

Reference:


Replication Method :

Full Table

Primary Key :

distinct_id

API endpoint :

Export formatted data (Engage)

The engage table contains info about user profiles.

The schema for this table is dynamic, meaning that the columns Stitch detects are dependent upon the properties provided upon upload in Mixpanel. For every property available in Mixpanel for engage records, Stitch will display a column in the integration’s Tables to Replicate tab.

distinct_id
STRING

The profile ID.

OTHER_ATTRIBUTES
VARIES

For every property available in Mixpanel for engage records, Stitch will display a column in the integration’s Tables to Replicate tab.


Replication Method :

Key-based Incremental

Replication Key :

time

Primary Key :

distinct_id : event : time

API endpoint :

Export raw data

The export table contains “raw data dumps” of tracked events.

The schema for this table is dynamic, meaning that the columns Stitch detects are dependent upon the properties provided upon upload in Mixpanel. For every property available in Mixpanel for export records, Stitch will display a column in the integration’s Tables to Replicate tab.

Note: This table is replicated using the Attribution Window value defined in the integration’s settings page. Refer to the Replication section for more info.

distinct_id
STRING

event
STRING

The event. For example: Viewed report

time
DATE-TIME

The time the event occurred.

dataset
STRING

labels
NULL

sampling_factor
INTEGER

OTHER_ATTRIBUTES
VARIES

For every property available in Mixpanel for export records, Stitch will display a column in the integration’s Tables to Replicate tab.


Replication Method :

Key-based Incremental

Replication Key :

date

Primary Key :

funnel_id : date

API endpoint :

Get funnels

The funnels table contains data about your Mixpanel funnels, segmented by funnel and day.

Note: This table is replicated using the Attribution Window value defined in the integration’s settings page. Refer to the Replication section for more info.

funnel_id
INTEGER

The funnel ID.

date
DATE

The date the data is for.

analysis
OBJECT

Performance statistics about the funnel for the given date.

completion
INTEGER

starting_amount
INTEGER

steps
INTEGER

worst
INTEGER

funnels (table), analysis (attribute)

datetime
DATE-TIME

The date the data is for.

name
STRING

The name of the funnel.

steps
ARRAY

Performance statistics about the steps in the funnel.

count
INTEGER

avg_time
NUMBER

goal
STRING

overall_conv_ratio
NUMBER

event
STRING

step_label
STRING

time_buckets_from_start
OBJECT

lower
INTEGER

higher
INTEGER

buckets
ARRAY

value
INTEGER

funnels (table), buckets (attribute)
funnels (table), time_buckets_from_start (attribute)

time_buckets_from_prev
OBJECT

lower
INTEGER

higher
INTEGER

buckets
ARRAY

value
INTEGER

funnels (table), buckets (attribute)
funnels (table), time_buckets_from_prev (attribute)
funnels (table), steps (attribute)

Replication Method :

Key-based Incremental

Replication Key :

date

Primary Key :

date

API endpoint :

Export formatted data (Revenue)

The revenue table contains info about revenue, segmented by day.

Note: This table is replicated using the Attribution Window value defined in the integration’s settings page. Refer to the Replication section for more info.

date
DATE

The day the revenue data is for.

amount
NUMBER

count
INTEGER

datetime
DATE-TIME

paid_count
INTEGER



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.