When you connect a SaaS integration, Stitch will begin the process of replicating not only that integration’s recent data, but the historical data as well. During the setup of the integration, you can choose the start date by using Stitch’s default starting date or defining your own custom date.


Historical data loads and Replication Keys

The default starting date (or a custom date, if you define one) essentially sets the Replication Keys for the Incremental tables in the integration. This tells Stitch how far back in time to query for historical data.

Note: Any tables using Full Table Replication will still replicate in full during every replication job, even during the initial job.

Unless you define a different starting date for an integration, Stitch will use the integration’s default starting date:

Selecting a custom start date in the Integration Settings page of the Stitch app

The majority of integrations have a default starting date of -1 year from the date the integration is created. For example: If you use the integration’s default date of -1 year and the date you create the integration is January 22, 2019, Stitch queue a historical replication job for data created/updated between January 22, 2018 - January 22, 2019.

Default starting dates

In the table below (click the link to open it), you’ll find a rollup of all the default start dates for SaaS integrations.

To see a list of that integration’s tables and the Replication Methods they use, click the integration name.

Integration Default starting date
3PL Central 1 year
ActiveCampaign 1 year
AdRoll 1 year
Amplitude n/a
AppsFlyer 60 days
Asana 1 year
Autopilot 1 year
BigCommerce 1 year
Braintree 1 year
Bronto 1 year
Campaign Manager 1 year
Campaign Monitor 1 year
Chargebee 1 year
Chargify 1 year
CircleCI 1 year
Close.io 1 year
Club Speed 1 year
Codat 1 year
COVID-19 Public Data 1 year
Crossbeam 1 year
Darksky 1 year
Deputy 1 year
Desk 1 year
Dixa 1 year
Doorbell.io n/a
eBay 1 year
Eloqua 1 year
Facebook Ads 1 year
Freshdesk 1 year
Front 1 year
FullStory 1 year
Google Analytics 360 1 year
Google Analytics 4 1 year
GitHub 1 year
GitLab 1 year
Google Ads (AdWords) 30 days
Google Ads 30 days
Google Analytics (AdWords) 30 days
Google Analytics 1 year
Google ECommerce 15 days
Google Search Console 1 year
Google Sheets 1 year
Harvest Forecast 1 year
Harvest 1 year
Heap 1 year
Help Scout 1 year
HubSpot 30 days
iLEVEL 1 year
Impact 1 year
Intacct 1 year
Intercom 1 year
Invoiced 1 year
Iterable 1 year
JIRA 1 year
Klaviyo 1 year
Kustomer 1 year
Lever 1 year
LinkedIn Ads 1 year
Listrak 1 year
LivePerson 1 year
Looker 1 year
LookML 1 year
MailChimp 1 year
Mailshake 1 year
Mambu 1 year
Marketo 1 year
Microsoft Advertising 1 year
Mixpanel 1 year
Microsoft Teams 1 year
NetSuite Suite Analytics 1 year
NetSuite 1 year
Onfleet 1 year
Outbrain 1 year
Outreach 1 year
Pardot 1 year
Pendo 1 year
Pepperjam 1 year
Pipedrive 1 year
Quick Base 1 year
QuickBooks 1 year
Recharge 1 year
Recurly 1 year
Referral SaaSquatch 1 year
Revinate 1 year
RingCentral 1 year
SaaSOptics 1 year
Sailthru 1 year
Salesforce Marketing Cloud 1 year
Salesforce 1 year
Selligent 1 year
SendGrid Core 1 year
ShipHero 1 year
Shippo 1 year
Shopify 1 year
Slack 1 year
Snapchat Ads 1 year
Square 1 year
Stripe 1 year
SurveyMonkey 1 year
Taboola 1 year
TikTok Ads 1 year
Toggl 1 year
Trello 1 year
Twilio 1 year
Twitter Ads 1 year
Typeform 1 year
UJET 1 year
Urban Airship 1 year
UserVoice 1 year
Wootric 1 year
Workday RaaS 1 year
Xero 1 year
Yotpo 1 year
Zendesk Chat 1 year
Zendesk Support 1 year
Zoom
Zuora 1 year

Uses and considerations

An integration’s start date can be defined when you initially connect the integration to Stitch or after the fact. If the date is changed after the initial setup, the integration’s Replication Keys will be reset AND a full re-replication of all the integration’s data will be queued.

Uses

Aside from ensuring Stitch replicates all the historical data you need, changing an integration’s start date can serve several other purposes:

  1. Account for hard-deletes. While we strongly recommend you use soft-deletes whenever possible, the full re-replication triggered by changing an integration’s start date will overwrite the data in your data warehouse. This will remove any hard-deleted records that may exist in your data warehouse but not in the source.
  2. Reset Replication Keys.
  3. Resolve data discrepancies. If you believe you’re missing data, try to narrow it down to a specific timeframe. If that timeframe falls outside the default starting date, this may be the root cause of the discrepancy. Changing the start date for the integration will bring in the data outside the original range.

    If this doesn’t apply, check out the Data Discrepancy Troubleshooting Guide for more data discrepancy troubleshooting tips.

Considerations

Note that these points shouldn’t cause worry or discourage you from setting up historical replication job or queueing re-replications - they’re only intended to give you a comprehensive look at the process so you can make an informed decision.

If you have any questions or concerns, reach out to support before changing the start date.

  1. This process cannot be undone. Once a historical replication job is queued, there’s no way to stop it.
  2. Depending on the integration, there may be limitations. Webhook-based integrations like SendGrid, for example, don’t retain historical data. Check out the rollup in the Default Starting Dates section for specifics.
  3. Row usage will spike. It should be noted that some integrations - like Mixpanel - can contain large (sometimes astronomical) amounts of data. The full re-replication triggered by changing the start date will count against your row count.
  4. Recent data may be re-replicated. For example: you set up an integration and the original replication job contained data only for 2016. You are now setting up a historical job for this integration with a start date of 1/1/2015. This will replicate data for all of 2015 and 2016.
  5. You may experience stale data/reports. When a historical replication job runs, no recent data will be retrieved until the replication and loading of the historical data is complete. The volume of data to be replicated and the design of the provider’s API can both affect how long a historical data load will take.

    For example: NetSuite’s API tends to be on the slower side, so it may take some time to complete a full re-replication due to the API design and the sheer amount of data that’s available.

  6. The time a historical replication job takes may be affected by an integration’s API quota. Some integrations - like Salesforce and Marketo - use API quotas, which limit your API usage. While our integrations are designed not to consume all of your available quota, if you’re using the integration’s API somewhere else, this process may use up your quota.

    As Stitch will be unable to continue replicating data once the quota has been consumed, this can extend the length of time the historical replication job will take, thus affecting the freshness of your reports.


Changing an integration’s start date

During the initial setup

To use a custom start date during the initial setup:

  1. After defining the rest of the integration’s settings, locate the Sync Historical Data section.
  2. Uncheck the Use Integration Default box.
  3. Define the new starting date using the drop-down.
  4. When finished, click the Save Integration button.

Note: It may take some time for Stitch to perform a structure sync for the integration and begin replicating data. After the structure sync is complete, Stitch will begin replicating data according to the integration’s Replication Schedule.

After the initial setup

  1. From the Stitch Dashboard page, click into the integration.
  2. In the Integration Details page, click the Settings tab, next to Tables to Replicate.
  3. Scroll down to the Sync Historical Data section.
  4. In the Start Date section, click the Change Date link.
  5. Define the new starting date using the drop-down.
  6. Click the Update Settings button.
  7. When prompted, click OK to confirm the change.

If successful, a confirmation message will display indicating the replication job has been queued. After a structure sync is performed, Stitch will begin replicating data according to the integration’s Replication Schedule.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.