Want to improve your BigQuery performance and query costs? When Stitch loads data into BigQuery, tables are created without partitioning or clustering. However, you can apply these performance enhancement tools to your table to streamline query processing, which Stitch will respect on subsequent loads.

In this guide, we’ll walk you through how to add partitioning and clustering to a BigQuery table created by Stitch.

Overview

BigQuery’s table partitioning and clustering features can improve query performance and cost by structuring data to match common query patterns.

Learn more in BigQuery’s table partitioning documentation and clustering documentation.

Considerations

Before diving in, keep in mind that optimizing for every single query isn’t possible. Tables can only be partitioned by one field, which must be a timestamp or date column, and clustered by a single set of columns.

The ideal choice of partitioning and clustering column(s) depends on the nature of your data and queries.


Prerequisites


Step 1: Sign into Stitch and the BigQuery Web UI

Sign into Stitch and the BigQuery Web UI to get started.

As an example, we’ll use a table called orders, which is contained in the rep_sales dataset.


Step 2: Pause Stitch loading

From Stitch, pause all source integrations that contain tables you plan to modify.

You will also need to ensure that Stitch doesn’t load any data while you are modifying the tables. To do this, monitor the Integration Details page for each paused integration until:

  1. No Extractions are in progress, and
  2. There are zero rows in Preparing

When the integrations meet these criteria, you can move onto the next step.


Step 3: Create a temporary table with partitioning and clustering

Next, you’ll create a temporary copy of the table with partitioning and clustering added on the created_at column.

Run the following from the BigQuery Web UI Query Editor:

CREATE TABLE rep_sales.orders_tmp
PARTITION BY DATE(created_at)
  CLUSTER BY created_at 
  AS 
    SELECT *
      FROM rep_sales.orders

Step 4: Drop the original table and rename the copy

As BigQuery doesn’t support renaming tables, you’ll have to drop the original table and then copy the temporary table into its place.

  1. To drop the original table, run the following from the BigQuery Web UI:

     DROP TABLE rep_sales.orders
    
  2. Copy the table via the Web UI. See BigQuery’s documentation for additional instructions.

    In the Copy Table dialog, define the fields as follows:

    • Destination dataset: Use the original dataset name. In this example, that’s rep_sales.
    • Destination table: Use the original table name. In this example, that’s orders.

    After you’ve copied the table, move onto the next step.

  3. To drop the temporary table, run the following from the BigQuery Web UI:

     DROP TABLE rep_sales.orders_tmp
    

Step 5: Unpause Stitch integrations

Return to Stitch and unpause any integrations that you paused in Step 2.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.