Release Status Released Supported By Singer community
Availability Free Status Page GitHub Status Page
Default Historical Sync 1 year Default Replication Frequency 1 hour
Whitelisting Tables and columns Destination Incompatibilities None

Connecting GitHub

GitHub Setup requirements

To set up GitHub in Stitch, you need:

  • A valid access token which allows access to any projects you want to replicate data from. Stitch will only be able to access the same projects as the user who creates the access token.

Step 1: Create a GitHub token

  1. Sign into your GitHub account.
  2. Click the User menu (your icon) > Settings.
  3. Click Developer settings in the navigation on the left side of the page.
  4. Click Personal access tokens.
  5. On the Personal access tokens page, click the Generate new token button. If prompted, enter your password.
  6. In the Description field, enter stitch. This will allow you to easily idenfiy what application is using the token.
  7. Click the Generate token button.
  8. The new access token will display on the next page. Copy the token before navigating away from the page - GitHub won’t display it again.

Step 2: Add GitHub as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the GitHub icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch GitHub” would create a schema called stitch_github in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the GitHub Access Token field, paste the access token you created in the Step 1.
  6. In the GitHub Repository Name field, enter the repository you want to track. For example: docs

    Note: At this time, only one repository may be tracked per integration. To track multiple repositories, you’ll need to create additional GitHub integrations in your Stitch account.

Step 3: Define the Historical Sync

The Sync Historical Data setting will define the starting date for your GitHub integration. This means that:

  • For tables using Incremental Replication, data equal to or newer than this date will be replicated to your data warehouse.
  • For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your data warehouse.

Change this setting if you want to replicate data beyond GitHub’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

Stitch offers two methods of creating a replication schedule:

  • Replication Frequency: This method requires selecting the interval you want replication to run for the integration. Start times of replication jobs are based on the start time and duration of the previous job. Refer to the Replication Frequency documentation for more information and examples.
  • Anchor scheduling: Based on the Replication Frequency, or interval, you select, this method “anchors” the start times of this integration’s replication jobs to a time you select to create a predictable schedule. Anchor scheduling is a combination of the Anchor Time and Replication Frequency settings, which must both be defined to use this method. Additionally, note that:

    • A Replication Frequency of at least one hour is required to use anchor scheduling.
    • An initial replication job may not begin immediately after saving the integration, depending on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.

To help prevent overages, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Initial and historical replication jobs

After you finish setting up GitHub, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


GitHub table schemas

Replication Method: Full Table API Endpoint: listAssignees
Primary Key : id

The assignees table contains info about the available assignees for issues in a repository.

id
INTEGER

The assignee ID.

login
STRING

The user’s username.

type
STRING

The user’s type.

url
STRING

The profile URL associated with the user.


Replication Method: Full Table API Endpoint: listCollaborators
Primary Key : id

The collaborators table contains info about the users who contribute to a repository.

For organization-owned repositories, this will include outside collaborators, organization owners, organization members that are direct collaborators, who have access through team memberships, or have access through default organization permissions.

id
INTEGER

The collaborator’s ID.

Reference:

login
STRING

The collaborator’s username.

type
STRING

The collaborator’s type.

url
STRING

The profile URL associated with the collaborator.


Replication Method: Key-based Incremental Replication Key : since
Primary Key : sha API Endpoint: listRepositoryCommits

The commits table contains info about repository commits in a project.

sha
STRING

The git commit hash.

comments_url
STRING

The URL to the commit’s comments page.

commit__url
STRING

The URL to the commit.

commit__tree__sha
STRING

The git commit tree hash.

commit__tree__url
STRING

The URL to the commit tree.

commit__author__date
STRING

The date the author committed the change.

commit__author__email
STRING

The author’s email address.

commit__author__name
STRING

The author’s name.

commit__message
STRING

The commit message.

commit__committer__date
STRING

The date the committer committed the change.

commit__committer__email
STRING

The committer’s email address.

commit__committer__name
STRING

The committer’s name.

commit__comment_count
INTEGER

The number of comments on the commit.

html_url
STRING

The HTML URL to the commit.

parents

Details about the parent commits.

If your destination doesn't natively support nested data, this data may be denested into a subtable named commits__parents. Refer to the Singer schema for details on possible attributes.

_sdc_source_key_sha
STRING

The git commit hash.

_sdc_level_0_id
INTEGER

This column forms part of a composite key for the table. The value will auto-increment for each unique record, beginning with 0.

sha
STRING

The git hash of the parent commit.

html_url
STRING

The HTML URL to the parent commit.

url
STRING

The URL to the parent commit.

url
STRING

The URL to the commit.


Replication Method: Key-based Incremental Replication Key : updated_at
Primary Key : id API Endpoint: listIssuesForRepository

The issues table contains info about repository issues.

Issues and pull requests

GitHub’s API considers every pull request an issue, but not every issue may be a pull request. Therefore, this table may contain both issues and pull requests.

id
INTEGER

The issue ID.

Reference:

updated_at
DATE-TIME

The last time the issue was updated.


Replication Method: Full Table API Endpoint: listPullRequests
Primary Key : id

The pull_requests table contains info about pull requests made against the repository.

id
STRING

The pull request ID.

Reference:

updated_at
DATE-TIME

The last time the pull request was updated.

body
STRING

The description of the pull request.

closed_at
STRING

The time the pull request was closed.

created_at
STRING

The time the pull request was created.

merged_at
STRING

The time the pull request was merged.

number
INTEGER

The number of the pull request in the repository.

state
STRING

The current status of the pull request. For example: open

title
STRING

The title of the pull request.

url
STRING

The URL to the pull request.

user__id
INTEGER

The user ID.

Reference:

user__login
STRING

The user’s GitHub username.


Replication Method: Full Table API Endpoint: listReviewsOnPullRequest
Primary Key : id

The reviews table contains info about pull request reviews. A pull request review is a group of comments on a pull request.

id
INTEGER

The review ID.

body
STRING

The description of the review.

commit_id
STRING

The ID of the commit the review was performed on.

Reference:

html_url
STRING

The HTML URL to the review.

pull_request_url
STRING

The URL to the pull request being reviewed.

state
STRING

The state of the review. Possible values are:

  • APPROVED
  • PENDING
  • CHANGES_REQUESTED
user__id
INTEGER

The user ID.

Reference:

user__login
STRING

The user’s GitHub username.


Replication Method: Key-based Incremental Replication Key : starred_at
Primary Key : user_id API Endpoint: listStargazers

The stargazers table contains info about users who have starred a repository.

user_id
INTEGER

The user ID.

starred_at
STRING

The time the user starred the repository.

user__id
INTEGER

The user ID.



Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.