Our BigQuery destination is currently in Open Beta. As such, the info in this article may change.
Google BigQuery is a fully managed, cloud-based big data analytics web service for processing very large read-only data sets. BigQuery was designed for analyzing data on the order of billions of rows, using a SQL-like syntax.
For more information, check out Google’s BigQuery overview.
In this guide, we'll walk you through:
- BigQuery's pricing model,
- Some high-level limitations (including any incompatible data sources),
- How to spin up a BigQuery data warehouse of your own, and
- How Stitch loads and organizes data in BigQuery.
Unlike many other cloud-based data warehouse solutions, BigQuery’s pricing model is based on usage and not a fixed-rate. This means that your bill can vary over time.
Before fully committing yourself to using BigQuery as your data warehouse, we recommend familiarizing yourself with the BigQuery pricing model and how using Stitch may impact your costs.
Every database has its own supported limits and way of handling data, and BigQuery is no different. The table below provides a very high-level look at what BigQuery supports, including any possible incompatibilities with Stitch’s integration offerings.
BigQuery’s full list; Stitch reserves
|Table Name Length||1,024 characters||Column Name Length||128 characters|
|Max # of Columns||10,000||VARCHAR Max Width||None|
|Nested Structure Support||Native Support||Case||Case Insensitive|
Not sure if BigQuery is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
To set up BigQuery, Stitch requires:
- A user that:
A United States (US)-based BigQuery instance. Currently, Stitch can only create US-based GCS buckets, which are required for the replication process. US-based buckets are only compatible with US-based BigQuery instances.
This means that instances based in other regions - for example, the EU - will not currently work with Stitch’s BigQuery destination. If you’re interested in Stitch supporting this feature, please let us know.
After you’ve successfully connected your BigQuery data warehouse to Stitch, you can start adding integrations and replicating data.
For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. This is where all the tables for that inegration will be stored.
Stitch will encounter dozens of scenarios when replicating and loading your data. To learn more about how BigQuery handles these scenarios, check out the Data Loading Guide for BigQuery.
Rejected Records Log
Occasionally, Stitch will encounter data that it can’t load into the data warehouse. For example: a table contains more columns than BigQuery’s allowed limit of 10,000 columns per table.
When this happens, the data will be “rejected” and logged in a table called
_sdc_rejected. Every integration schema created by Stitch will include this table as well as the other tables in the integration.
Stitch replicates data from your sources based on the integration’s Replication Frequency and the Replication Method used by the tables in the integration. In Stitch, you have the ability to control what and how often data is replicated for the majority of integrations.
The current release of Stitch’s BigQuery destination uses Append-Only Incremental Replication.
For SaaS and database tables that use Incremental Replication, Stitch will replicate data into BigQuery in an append-only fashion. This means that data that updates an existing row will NOT overwrite the row. Instead, a new row with the new data will be appended to the end of the table.
This means that there can be many different rows in a BigQuery table with the same Primary Key, each representing what the data was at that moment in time.
Querying an append-only table requires a different strategy than you would normally use. For more info, check out the Querying Append-Only Tables guide.
The time from the sync start to data being loaded into your data warehouse can vary depending on a number of factors, especially for initial historical loads.
To learn more about Stitch’s replication process and how loading time can be affected, check out the Stitch Replication Process guide.
Did this article help? If you have questions or feedback, please reach out to us.