This integration is powered by Singer's GitHub tap. For support, visit the GitHub repo or join the Singer Slack.
GitHub snapshot
A high-level look at Stitch's GitHub integration, including release status, useful links, and the features supported in Stitch.
STITCH | |||
Release Status |
Released |
Supported By | |
Stitch Plan |
Free |
Singer GitHub Repository | |
DATA SELECTION | |||
Table Selection |
Supported |
Column Selection |
Supported |
REPLICATION SETTINGS | |||
Anchor Scheduling |
Supported |
Table-level Reset |
Unsupported |
Configurable Replication Methods |
Unsupported |
||
TRANSPARENCY | |||
Extraction Logs |
Supported |
Loading Reports |
Supported |
Connecting GitHub
GitHub setup requirements
To set up GitHub in Stitch, you need:
-
A valid access token which allows access to any projects you want to replicate data from. Stitch will only be able to access the same projects as the user who creates the access token.
Step 1: Create a GitHub token
- Sign into your GitHub account.
- Click the User menu (your icon) > Settings.
- Click Developer settings in the navigation on the left side of the page.
- Click Personal access tokens.
- On the Personal access tokens page, click the Generate new token button. If prompted, enter your password.
- In the Description field, enter
stitch
. This will allow you to easily idenfiy what application is using the token. - Click the Generate token button.
- The new access token will display on the next page. Copy the token before navigating away from the page - GitHub won’t display it again.
Step 2: Add GitHub as a Stitch data source
- Sign into your Stitch account.
-
On the Stitch Dashboard page, click the Add Integration button.
-
Click the GitHub icon.
-
Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.
For example, the name “Stitch GitHub” would create a schema called
stitch_github
in the destination. Note: Schema names cannot be changed after you save the integration. - In the GitHub Access Token field, paste the access token you created in the Step 1.
-
In the GitHub Repository Name field, enter the username and repository you want to track seperated by a forwardslash. For example:
mygithubusername/docs
Note: At this time, only one repository may be tracked per integration. To track multiple repositories, you’ll need to create additional GitHub integrations in your Stitch account.
Step 3: Define the historical sync
The Sync Historical Data setting will define the starting date for your GitHub integration. This means that:
- For tables using Incremental Replication, data equal to or newer than this date will be replicated to your data warehouse.
- For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your data warehouse.
Change this setting if you want to replicate data beyond GitHub’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.
Step 4: Create a replication schedule
In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.
Stitch offers two methods of creating a replication schedule:
- Replication Frequency: This method requires selecting the interval you want replication to run for the integration. Start times of replication jobs are based on the start time and duration of the previous job. Refer to the Replication Frequency documentation for more information and examples.
-
Anchor scheduling: Based on the Replication Frequency, or interval, you select, this method “anchors” the start times of this integration’s replication jobs to a time you select to create a predictable schedule. Anchor scheduling is a combination of the Anchor Time and Replication Frequency settings, which must both be defined to use this method. Additionally, note that:
- A Replication Frequency of at least one hour is required to use anchor scheduling.
- An initial replication job may not begin immediately after saving the integration, depending on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.
To help prevent overages, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.
Initial and historical replication jobs
After you finish setting up GitHub, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.
For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.
Initial replication jobs with Anchor Scheduling
If using Anchor Scheduling, an initial replication job may not kick off immediately. This depends on the selected Replication Frequency and Anchor Time. Refer to the Anchor Scheduling documentation for more information.
Free historical data loads
The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.
GitHub table schemas
Table and column names in your destination
Depending on your destination, table and column names may not appear as they are outlined below.
For example: Object names are lowercased in Redshift (CusTomERs
> customers
), while case is maintained in PostgreSQL destinations (CusTomERs
> CusTomERs
). Refer to the Loading Guide for your destination for more info.
assignees
Replication Method : |
Full Table |
Primary Key |
id |
API endpoint : |
The assignees
table contains info about the available assignees for issues in a repository.
INTEGER |
The assignee ID. |
login
STRING |
The user’s username. |
type
STRING |
The user’s type. |
url
STRING |
The profile URL associated with the user. |
collaborators
Replication Method : |
Full Table |
Primary Key |
id |
API endpoint : |
The collaborators
table contains info about the users who contribute to a repository.
For organization-owned repositories, this will include outside collaborators, organization owners, organization members that are direct collaborators, who have access through team memberships, or have access through default organization permissions.
INTEGER |
The collaborator’s ID. Reference: |
login
STRING |
The collaborator’s username. |
type
STRING |
The collaborator’s type. |
url
STRING |
The profile URL associated with the collaborator. |
comments
Replication Method : |
Key-based Incremental |
Replication Key |
updated_at |
Primary Key |
id |
API endpoint : |
The comments
table contains info about comments made on issues.
INTEGER |
The comment ID. |
DATE-TIME |
The time the comment was last updated. |
body
STRING |
The body of the comment. |
created_at
DATE-TIME |
The time the comment was created. |
home_url
STRING |
The home URL of the comment. |
html_url
STRING |
The HTML URL of the comment. |
issue_url
STRING |
The URL of the issue associated with the comment. |
node_id
STRING |
The node ID. |
url
STRING |
The GitHub URL of the comment. |
user__login
STRING |
The login name of the user who created the comment. |
user__id
STRING |
The ID of the user who created the comment. |
user__node_id
STRING |
The node ID of the user who created the comment. |
user__avatar_url
STRING |
The URL of the avatar of the user who created the comment. |
user__gravatar_id
STRING |
The URL of the Gravatar of the user who created the comment. |
user__url
STRING |
The API URL of the user who created the comment. |
user__html_url
STRING |
The GitHub URL of the user who created the comment. |
user__followers_url
STRING |
The URL to the user’s followers page. |
user__following_url
STRING |
The URL to the user’s following page. |
user__gists_url
STRING |
The URL to the user’s gists page. |
user__starred_url
STRING |
The URL to the user’s starred page. |
user__subscriptions_url
STRING |
The URL to the user’s subscriptions page. |
user__organizations_url
STRING |
The URL to the user’s organizations page. |
user__repos_url
STRING |
The URL to the user’s repositories page. |
user__events_url
STRING |
The URL to the user’s events page. |
user__received_events_url
STRING |
The URL to the user’s received events page. |
user__type
STRING |
The type of the user. |
user__site_admin
STRING |
Indicates if the user is a site administrator. |
commits
Replication Method : |
Key-based Incremental |
Replication Key |
since |
Primary Key |
sha |
API endpoint : |
The commits
table contains info about repository commits in a project.
STRING |
The git commit hash. |
||||||||||
comments_url
STRING |
The URL to the commit’s comments page. |
||||||||||
commit__url
STRING |
The URL to the commit. |
||||||||||
commit__tree__sha
STRING |
The git commit tree hash. |
||||||||||
commit__tree__url
STRING |
The URL to the commit tree. |
||||||||||
commit__author__date
STRING |
The date the author committed the change. |
||||||||||
commit__author__email
STRING |
The author’s email address. |
||||||||||
commit__author__name
STRING |
The author’s name. |
||||||||||
commit__message
STRING |
The commit message. |
||||||||||
commit__committer__date
STRING |
The date the committer committed the change. |
||||||||||
commit__committer__email
STRING |
The committer’s email address. |
||||||||||
commit__committer__name
STRING |
The committer’s name. |
||||||||||
commit__comment_count
INTEGER |
The number of comments on the commit. |
||||||||||
html_url
STRING |
The HTML URL to the commit. |
||||||||||
parents |
Details about the parent commits.
If your destination doesn't natively support nested data, this data may be denested into a subtable named
|
||||||||||
url
STRING |
The URL to the commit. |
issues
Replication Method : |
Key-based Incremental |
Replication Key |
updated_at |
Primary Key |
id |
API endpoint : |
The issues
table contains info about repository issues.
Issues and pull requests
GitHub’s API considers every pull request an issue, but not every issue may be a pull request. Therefore, this table may contain both issues and pull requests.
INTEGER |
The issue ID. Reference: |
DATE-TIME |
The last time the issue was updated. |
pull_requests
Replication Method : |
Full Table |
Primary Key |
id |
API endpoint : |
The pull_requests
table contains info about pull requests made against the repository.
STRING |
The pull request ID. Reference: |
updated_at
DATE-TIME |
The last time the pull request was updated. |
body
STRING |
The description of the pull request. |
closed_at
STRING |
The time the pull request was closed. |
created_at
STRING |
The time the pull request was created. |
merged_at
STRING |
The time the pull request was merged. |
number
INTEGER |
The number of the pull request in the repository. |
state
STRING |
The current status of the pull request. For example: |
title
STRING |
The title of the pull request. |
url
STRING |
The URL to the pull request. |
user__id
INTEGER |
The user ID. Reference: |
user__login
STRING |
The user’s GitHub username. |
review_comments
Replication Method : |
Key-based Incremental |
Replication Key |
updated_at |
Primary Key |
id |
API endpoint : |
The review_comments
table contains info about comments made on pull request reviews.
Note: In order to replicate this table, you must also set the pull_requests
table to replicate.
INTEGER |
The review comment ID. |
DATE-TIME |
The time the review comment was last updated. |
body
STRING |
The body of the review comment. |
commit_id
STRING |
The ID of the commit the review comment is associated with. Reference: |
created_at
DATE-TIME |
The time the review comment was created. |
diff_url
STRING |
The diff URL associated with the review comment. |
html_url
STRING |
The HTML URL of the review comment. |
in_reply_to_id
INTEGER |
If the review comment is a reply to another review comment, this will be the ID of the review comment it is in response to. |
issue_url
STRING |
The URL of the issue associated with the review comment. |
node_id
STRING |
The review comment’s node ID. |
original_position
INTEGER |
The original position of the review comment. |
original_commit_id
STRING |
The ID of the original comment the review comment is associated with. Reference: |
pull_request_review_id
INTEGER |
The ID of the pull request review the comment is a part of. Reference: |
path
STRING |
The path of the file the review comment was made on. |
position
INTEGER |
The position of the review comment. |
pull_request_url
STRING |
The URL of the pull request associated with the review comment. |
url
STRING |
The GitHub URL of the review comment. |
user__login
STRING |
The login name of the user who created the review comment. |
user__id
STRING |
The ID of the user who created the review comment. |
reviews
Replication Method : |
Full Table |
Primary Key |
id |
API endpoint : |
The reviews
table contains info about pull request reviews. A pull request review is a group of comments on a pull request.
Note: In order to replicate this table, you must also set the pull_requests
table to replicate.
INTEGER |
The review ID. |
body
STRING |
The description of the review. |
commit_id
STRING |
The ID of the commit the review was performed on. Reference: |
html_url
STRING |
The HTML URL to the review. |
pull_request_url
STRING |
The URL to the pull request being reviewed. |
state
STRING |
The state of the review. Possible values are:
|
user__id
INTEGER |
The user ID. Reference: |
user__login
STRING |
The user’s GitHub username. |
stargazers
Replication Method : |
Key-based Incremental |
Replication Key |
starred_at |
Primary Key |
user_id |
API endpoint : |
The stargazers
table contains info about users who have starred a repository.
INTEGER |
The user ID. |
STRING |
The time the user starred the repository. |
user__id
INTEGER |
The user ID. |
Related | Troubleshooting |
Questions? Feedback?
Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.