Want to improve your BigQuery performance and query costs? When Stitch loads data into BigQuery, tables are created without partitioning or clustering. However, you can apply these performance enhancement tools to your table to streamline query processing, which Stitch will respect on subsequent loads.
In this guide, we’ll walk you through how to add partitioning and clustering to a BigQuery table created by Stitch.
BigQuery’s table partitioning and clustering features can improve query performance and cost by structuring data to match common query patterns.
Before diving in, keep in mind that optimizing for every single query isn’t possible. Tables can only be partitioned by one field, which must be a timestamp or date column, and clustered by a single set of columns.
The ideal choice of partitioning and clustering column(s) depends on the nature of your data and queries.
The required user permissions. The user performing this process must have the permissions outlined in BigQuery’s documentation.
Step 1: Sign into Stitch and the BigQuery Web UI
Sign into Stitch and the BigQuery Web UI to get started.
As an example, we’ll use a table called
orders, which is contained in the
Step 2: Pause Stitch loading
From Stitch, pause all source integrations that contain tables you plan to modify.
You will also need to ensure that Stitch doesn’t load any data while you are modifying the tables. To do this, monitor the Integration Details page for each paused integration until:
- No Extractions are in progress, and
- There are zero rows in Preparing
When the integrations meet these criteria, you can move onto the next step.
Step 3: Create a temporary table with partitioning and clustering
Next, you’ll create a temporary copy of the table with partitioning and clustering added on the
Run the following from the BigQuery Web UI Query Editor:
CREATE TABLE rep_sales.orders_tmp PARTITION BY DATE(created_at) CLUSTER BY created_at AS SELECT * FROM rep_sales.orders
Step 4: Drop the original table and rename the temporary table
To drop the original table, run the following from the BigQuery Web UI:
DROP TABLE rep_sales.orders
You can also rename it and keep it as a backup.
To rename the temporary table, run the following from the BigQuery Web UI:
ALTER TABLE rep_sales.orders_tmp RENAME TO orders
Step 5: Unpause Stitch integrations
Return to Stitch and unpause any integrations that you paused in Step 2.