Google Sheets integration summary

Stitch’s Google Sheets integration replicates data using the Google Sheets v4 AP1. Refer to the Schema section for a list of objects available for replication.

Stitch’s Google Sheets integration will generate tables containing data related to metadata and the individual sheets within a spreadsheet.

Note: There are a few limitations:

  • Currently, the Google Sheets integration replicates one spreadsheet at a time. To replicate another spreadsheet, you will need to create another Google Sheets integration in Stitch.
  • The IMPORTRANGE() function in Google Sheets isn’t currently supported. This integration identifies new and updated data using a spreadsheet’s last updated_at value, which the IMPORTRANGE() doesn’t update when used.
  • Spreadsheets from shared Team Drives aren’t currently supported. Permission and/or File Not Found errors will surface during extraction if you connect a spreadsheet from a shared Team Drive.

Google Sheets feature snapshot

A high-level look at Stitch's Google Sheets (v1) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status

Beta

Supported by

Stitch

Stitch plan

Standard

API availability

Available

Singer GitHub repository

singer-io/tap-google-sheets

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Supported

Table-level reset

Unsupported

Configurable Replication Methods

Unsupported

DATA SELECTION
Table selection

Supported

Column selection

Supported

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting Google Sheets

Google Sheets setup requirements

To set up Google Sheets in Stitch, you need:

  • A spreadsheet in your Google Drive (My Drive). Stitch’s Google Sheets integration doesn’t currently support replicating spreadsheets from shared Team Drives.

  • A header row with unique column values in the first row of every sheet you want to replicate. If there are multiple headers not in the first row, your worksheet data may not be replicated correctly. Headers that aren’t in the first row may be extracted as column data.

  • A full row of data in the second row of every sheet you want to replicate. Data must begin in the second row of the sheet. Values in this row may not be NULL or issues will arise during Extraction.

Step 1: Obtain your spreadsheet ID

  1. Go to Google Sheets and log into the Google account associated with the spreadsheet you are looking to integrate.
  2. Open the spreadsheet that you want to use in the integration.
  3. The Spreadsheet ID is within the URL to the webpage. In the image below, the portion of the URL within the blue box is the Spreadsheet ID. Keep this readily available to continue with the integration. Note: The file should be stored in My Drive and not a shared drive or you’ll receive a File Not Found error.
Google Sheets URL containing the Spreadsheet ID.

Step 2: Add Google Sheets as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the Google Sheets icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch Google Sheets” would create a schema called stitch_google_sheets in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the Spreadsheet ID field, enter your Spreadsheet ID you obtained from the previous step. Note: To integrate another spreadsheet, you’ll need to repeat these steps over again with another Google Sheets integration.

Step 3: Define the historical sync

The Sync Historical Data setting defines the starting date for your Google Sheets integration. This means that:

  • For tables using Key-based Incremental Replication, data equal to or newer than this date will be replicated to your destination.
  • For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your destination.

Change this setting if you want to replicate data beyond Google Sheets’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Google Sheets integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Step 5: Set tables and columns to replicate

To complete the setup, you’ll need to select the tables and columns you want to replicate to your destination.

Check out the Schema section to learn more about the available tables in Google Sheets and how they replicate.

  1. In the list of tables that displays - or in the Tables to Replicate tab, if you skipped this step during setup - locate a table you want to replicate.
  2. To track a table, click the checkbox next to the table’s name. A green checkmark means the table is set to replicate.

  3. To track a column, click the checkbox next to the column’s name. A green checkmark means the column is set to replicate.

  4. Repeat this process for all the tables and columns you want to replicate.
  5. When finished, click the Finalize Your Selections button at the bottom of the screen to save your selections.

Note: If you change these settings while a replication job is still in progress, they will not be used until the next job starts.

Initial and historical replication jobs

After you finish setting up Google Sheets, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


Google Sheets replication

In this section:

Extraction

For every table set to replicate, Stitch will perform the following during Extraction:

Discovery

During Discovery, Stitch will:

Determining table schemas

At the start of each replication job, Stitch will check the sheets’s header row and first data row (the second row in the sheet) for data.

To be detected and properly replicated, every sheet set to replicate must have:

  1. Column headers with unique values in the first row. If there are duplicate column names, Stitch will skip the sheet and surface a duplicate column name error.

    For example: Two columns in the header row can’t be named customer_id. Uniqueness must not rely on case. While customer_id and Customer_ID may be unique due to case differences, this may still cause errors during extraction and loading. For this reason, column names must be completely unique.

  2. A full row of data in the second row. If any column in this row contains a NULL value, Stitch will skip the sheet and surface a malformed sheet message during extraction.

If the sheet doesn’t contain a header row and a second row of data, Stitch will skip the sheet and surface an empty sheet message during extraction.

Data typing

To determine data types, Stitch will analyze the first two rows in the files included in object discovery.

If a column has been specified as a STRING, Stitch will attempt to parse the value as a string. If this fails, the column will be loaded as a nullable STRING.

For all other columns, Stitch will perform the following to determine the column’s data type:

  1. Attempt to parse the value as a BOOLEAN value
  2. If that fails, attempt to parse the value as an INTEGER
  3. If that fails, attempt to parse the value as a DATE-TIME value
  4. If that fails, attempt to parse the value as a DATE date
  5. If that fails, attempt to parse the value as a TIME value
  6. If that fails, type the column as a STRING

Data replication

After discovery is completed, Stitch will move onto extracting data from the sheets set to replicate.

While data from Google Sheets integrations is replicated using Key-based Incremental Replication, the behavior for this integration differs subtly from other integrations.

The table below compares Key-based Incremental Replication and Replication Key behavior for Google Sheets to that of other integrations.

Google Sheets Other integrations
What's replicated during a replication job?

The entire contents of a modified spreadsheet. This includes all sheets in the spreadsheet that are set to replicate, regardless of whether they have been modified.

Only new or updated rows in a table.

What's used as a Replication Key?

The time a spreadsheet is modified.

A column or columns in a table.

Are Replication Keys inclusive?

No. Only spreadsheets with a modification timestamp value greater than the last saved bookmark are replicated.

Yes. Rows with a Replication Key value greater than or equal to the last saved bookmark are replicated.

To reduce row usage, consider scheduling the integration to replicate less frequently.

Loading

For every sheet you set to replicate, Stitch will create a table in your destination. These tables will contain the columns you select for replication, along with some system columns created by Stitch. Refer to the sample table in the next section for an example.


Google Sheets table reference

Replication Method :

Key-based Incremental

Replication Key :

modifiedTime

Primary Key :

id

API endpoint :

getSheets

The file_metadata table contains metadata about the spreadsheet defined in the integration’s settings.

id
STRING

The file ID.

modifiedTime
DATE-TIME

The date and time the file was last modified.

createdTime
DATE-TIME

The date the file was created.

driveId
STRING

The ID of the drive containing the file.

lastModifyingUser
OBJECT

The user who last modified the file.

displayName
STRING

The user’s display name.

emailAdress
STRING

The user’s email address.

kind
STRING

The type of user.

file_metadata (table), lastModifyingUser (attribute)

name
STRING

The name of the file.

teamDriveId
STRING

The team drive ID.

version
INTEGER

The version of the file.


Replication Method :

Key-based Incremental

Replication Key :

Spreadsheet’s modified_at

Primary Key :

__sdc_row

API endpoint :

getSheets

This is an example of a table created from a sheet you set to replicate.

For every sheet you set to replicate, Stitch will create a corresponding table in your destination. The table will contain the columns you select in Stitch, along with a few columns Stitch requires for replication.

Refer to the Data replication section for more info about how this table replicates.

__sdc_row
INTEGER

The number of the row in the spreadsheet.

__sdc_sheet_id
INTEGER

The ID of the sheet containing the record.

Reference:

__sdc_spreadsheet_id
STRING

The ID of the spreadsheet containing the record.

Reference:

[COLUMNS_YOU_SELECT]
VARIES

A column will be created for every column set to replicate in Stitch.


Replication Method :

Full Table

Primary Key :

sheetId : spreadsheetId

API endpoint :

getSheets

The sheet_metadata table contains metadata about the sheets within the spreadsheet defined in the integration’s settings.

sheetId
INTEGER

The ID of the sheet.

Reference:

spreadsheetId
STRING

The ID of the spreadsheet containing the sheet.

Reference:

columns
OBJECT

Details about the columns in the sheet.

columnIndex
INTEGER

The column index in the data table on which the filter is applied to.

columnLetter
STRING

The letter of the column.

columnName
STRING

The header name in the column.

columnType
STRING

The type of column.

columnSkipped
BOOLEAN

Field that identifies if the column has been skipped.

type
STRING

format
STRING

sheet_metadata (table), columns (attribute)

gridProperties
OBJECT

The properties of a grid.

columnCount
INTEGER

The number of columns in the grid.

frozenColumnCount
INTEGER

The number of columns that are frozen in the grid.

frozenRowCount
INTEGER

The number of rows that are frozen in the grid.

rowCount
INTEGER

The number of rows in the grid.

sheet_metadata (table), gridProperties (attribute)

index
INTEGER

The index of the sheet within the spreadsheet.

sheetType
STRING

The type of the sheet. Possible values are:

  • sheet_type_unspecified
  • object
  • grid

sheetUrl
STRING

The URL of the sheet.

title
STRING

The title of the spreadsheet.


Replication Method :

Full Table

Primary Key :

sheetId : spreadsheetId

API endpoint :

getSheets

The sheets_loaded table contains metadata about individual sheets loaded to your destination.

sheetId
INTEGER

The ID of the sheet.

Reference:

spreadsheetId
STRING

The ID of the spreadsheet.

Reference:

lastRowNumber
INTEGER

The number of the last row.

loadDate
DATE-TIME

The date the sheets were loaded.

title
STRING

The title of the spreadsheet.


spreadsheet_metadata

Replication Method :

Full Table

Primary Key :

spreadsheetId

API endpoint :

getSheets

The spreadsheet_metadata table contains metadata about the spreadsheet defined in the integration’s settings.

spreadsheetId
STRING

The ID of the spreadsheet.

Reference:

properties
OBJECT

Overall properties of a spreadsheet.

autoRecalc
STRING

The amount of time to wait before volatile functions are recalculated.

locale
STRING

The locale of the spreadsheet in one of the following formats:

  • an ISO 639-1 language code such as en
  • an ISO 639-2 language code such as fil, if no 639-1 code exists
  • a combination of the ISO language code and country code, such as en_US

timeZone
STRING

The time zone of the spreadsheet, in CLDR format such as America/New_York. If the time zone isn’t recognized, this may be a custom time zone such as GMT-07:00.

title
STRING

The title of the spreadsheet.

spreadsheet_metadata (table), properties (attribute)

spreadsheetUrl
STRING

The url of the spreadsheet.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.