GitHub integration summary

Stitch’s GitHub integration replicates data using the GitHub REST API v3. Refer to the Schema section for a list of objects available for replication.

GitHub feature snapshot

A high-level look at Stitch's GitHub (v1) integration, including release status, useful links, and the features supported in Stitch.

STITCH
Release status

Released on June 4, 2018

Supported by

Singer Community

Stitch plan

Standard

API availability

Available

Singer GitHub repository

singer-io/tap-github

REPLICATION SETTINGS
Anchor Scheduling

Supported

Advanced Scheduling

Supported

Table-level reset

Unsupported

Configurable Replication Methods

Unsupported

DATA SELECTION
Table selection

Supported

Column selection

Supported

Select all

Supported

TRANSPARENCY
Extraction Logs

Supported

Loading Reports

Supported

Connecting GitHub

GitHub setup requirements

To set up GitHub in Stitch, you need:

  • Access to the projects you want to replicate data from. Stitch will only be able to access the same projects as the user who creates the access token.


Step 1: Create a GitHub token

  1. Sign into your GitHub account.
  2. Click the User menu (your icon) > Settings.
  3. Click Developer settings in the navigation on the left side of the page.
  4. Click Personal access tokens.
  5. On the Personal access tokens page, click the Generate new token button. If prompted, enter your password.
  6. In the Description field, enter stitch. This will allow you to easily idenfiy what application is using the token.
  7. In the Select Scopes section, check the repo option:

    Highlighted repo scopes on the GitHub Personal Access Tokens page

    Note: While these are full permissions, Stitch will only ever read your data. The repo scope is required due to how GitHub structures permissions.

  8. Click the Generate token button.
  9. The new access token will display on the next page. Copy the token before navigating away from the page - GitHub won’t display it again.

Step 2: Add GitHub as a Stitch data source

  1. Sign into your Stitch account.
  2. On the Stitch Dashboard page, click the Add Integration button.

  3. Click the GitHub icon.

  4. Enter a name for the integration. This is the name that will display on the Stitch Dashboard for the integration; it’ll also be used to create the schema in your destination.

    For example, the name “Stitch GitHub” would create a schema called stitch_github in the destination. Note: Schema names cannot be changed after you save the integration.

  5. In the GitHub Access Token field, paste the access token you created in Step 1.
  6. In the GitHub Repository Name field, enter the paths of the repositories you want to track. The path is relative to https://github.com. For example: The path for the Stitch Docs repository is stitchdata/docs

    To track multiple repositories, enter a space delimited list of the repository paths. For example: stitchdata/docs stitchdata/docs-about-docs

Step 3: Define the historical replication start date

The Sync Historical Data setting defines the starting date for your GitHub integration. This means that:

  • For tables using Key-based Incremental Replication, data equal to or newer than this date will be replicated to your destination.
  • For tables using Full Table Replication, all data - including records that are older, equal to, or newer than this date - will be replicated to your destination.

Change this setting if you want to replicate data beyond GitHub’s default setting of 1 year. For a detailed look at historical replication jobs, check out the Syncing Historical SaaS Data guide.

Step 4: Create a replication schedule

In the Replication Frequency section, you’ll create the integration’s replication schedule. An integration’s replication schedule determines how often Stitch runs a replication job, and the time that job begins.

GitHub integrations support the following replication scheduling methods:

To keep your row usage low, consider setting the integration to replicate less frequently. See the Understanding and Reducing Your Row Usage guide for tips on reducing your usage.

Initial and historical replication jobs

After you finish setting up GitHub, its Sync Status may show as Pending on either the Stitch Dashboard or in the Integration Details page.

For a new integration, a Pending status indicates that Stitch is in the process of scheduling the initial replication job for the integration. This may take some time to complete.

Free historical data loads

The first seven days of replication, beginning when data is first replicated, are free. Rows replicated from the new integration during this time won’t count towards your quota. Stitch offers this as a way of testing new integrations, measuring usage, and ensuring historical data volumes don’t quickly consume your quota.


GitHub table reference

Replication Method :

Full Table

Primary Key :

id

API endpoint :

listAssignees

The assignees table contains info about the available assignees for issues in a repository.

id
INTEGER

The assignee ID.

login
STRING

The user’s username.

type
STRING

The user’s type.

url
STRING

The profile URL associated with the user.


Replication Method :

Full Table

Primary Key :

id

API endpoint :

listCollaborators

The collaborators table contains info about the users who contribute to a repository.

For organization-owned repositories, this will include outside collaborators, organization owners, organization members that are direct collaborators, who have access through team memberships, or have access through default organization permissions.

id
INTEGER

The collaborator’s ID.

Reference:

login
STRING

The collaborator’s username.

type
STRING

The collaborator’s type.

url
STRING

The profile URL associated with the collaborator.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

List comments on a pull request

The comments table contains info about comments made on issues.

id
INTEGER

The comment ID.

updated_at
DATE-TIME

The time the comment was last updated.

body
STRING

The body of the comment.

created_at
DATE-TIME

The time the comment was created.

home_url
STRING

The home URL of the comment.

html_url
STRING

The HTML URL of the comment.

issue_url
STRING

The URL of the issue associated with the comment.

node_id
STRING

The node ID.

url
STRING

The GitHub URL of the comment.

user
OBJECT

Details about the user who created the comment.

login
STRING

The login name of the user who created the comment.

id
STRING

The ID of the user who created the comment.

node_id
STRING

The node ID of the user who created the comment.

avatar_url
STRING

The URL of the avatar of the user who created the comment.

gravatar_id
STRING

The URL of the Gravatar of the user who created the comment.

url
STRING

The API URL of the user who created the comment.

html_url
STRING

The GitHub URL of the user who created the comment.

followers_url
STRING

The URL to the user’s followers page.

following_url
STRING

The URL to the user’s following page.

gists_url
STRING

The URL to the user’s gists page.

starred_url
STRING

The URL to the user’s starred page.

subscriptions_url
STRING

The URL to the user’s subscriptions page.

organizations_url
STRING

The URL to the user’s organizations page.

repos_url
STRING

The URL to the user’s repositories page.

events_url
STRING

The URL to the user’s events page.

received_events_url
STRING

The URL to the user’s received events page.

type
STRING

The type of the user.

site_admin
STRING

Indicates if the user is a site administrator.

comments (table), user (attribute)

Replication Method :

Key-based Incremental

Replication Key :

since

Primary Key :

sha

API endpoint :

listRepositoryCommits

The commits table contains info about repository commits in a project.

sha
STRING

The git commit hash.

comments_url
STRING

The URL to the commit’s comments page.

commit
OBJECT

Details about the commit.

url
STRING

The URL to the commit.

tree
OBJECT

Details about the commit tree.

sha
STRING

The git commit tree hash.

url
STRING

The URL to the commit tree.

commits (table), tree (attribute)

author
OBJECT

Details about the author of the commit.

date
STRING

The date the author committed the change.

email
STRING

The author’s email address.

name
STRING

The author’s name.

commits (table), author (attribute)

message
STRING

The commit message.

committer
OBJECT

Details about the user who committed the change.

date
STRING

The date the committer committed the change.

email
STRING

The committer’s email address.

name
STRING

The committer’s name.

commits (table), committer (attribute)

comment_count
INTEGER

The number of comments on the commit.

commits (table), commit (attribute)

html_url
STRING

The HTML URL to the commit.

parents
ARRAY

Details about the parent commits.

sha
STRING

The git hash of the parent commit.

html_url
STRING

The HTML URL to the parent commit.

url
STRING

The URL to the parent commit.

commits (table), parents (attribute)

url
STRING

The URL to the commit.


Replication Method :

Key-based Incremental

Replication Key :

created_at

Primary Key :

id

API endpoint :

List events

The events table contains information about events in your GitHub repositories.

id
NUMBER

The event ID.

created_at
STRING

The date the event was created.

actor
OBJECT

Information about the user that triggered an event.

avatar_url
STRING

display_login
STRING

gravatar_id
STRING

id
NUMBER

login
STRING

url
STRING

events (table), actor (attribute)

distinct_size
NUMBER

The number of distinct commits in a push.

head
STRING

The SHA of the most recent commit on ref after the push.

org
OBJECT

Information about the organization

avatar_url
STRING

gravatar_id
STRING

id
NUMBER

login
STRING

url
STRING

events (table), org (attribute)

payload
OBJECT

Information about the events payload.

action
STRING

before
STRING

comment
STRING

commits
ARRAY

author
OBJECT

email
STRING

name
STRING

events (table), author (attribute)

distinct
BOOLEAN

message
STRING

sha
STRING

url
STRING

events (table), commits (attribute)

description
STRING

issue
STRING

master_branch
STRING

pusher_type
STRING

ref
STRING

ref_type
STRING

events (table), payload (attribute)

public
BOOLEAN

When a private repository becomes public.

push_id
NUMBER

The push ID.

ref
STRING

The full git ref that was pushed.

repo
OBJECT

Information about the repository where the event occured.

id
NUMBER

name
STRING

url
STRING

events (table), repo (attribute)

size
NUMBER

The number of commits in the push.

type
STRING

The event type.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

listIssuesForRepository

The issues table contains info about repository issues.

Issues and pull requests

GitHub’s API considers every pull request an issue, but not every issue may be a pull request. Therefore, this table may contain both issues and pull requests.

id
INTEGER

The issue ID.

Reference:

updated_at
DATE-TIME

The last time the issue was updated.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

listProjectCards

The project_cards table contains information about cards in your GitHub project.

id
NUMBER

The project card ID.

updated_at
DATE-TIME

The time the card was last updated.

_sdc_repository
STRING

archived
BOOLEAN

Whether or not the card has been archived.

cards_url
STRING

The URL where the cards are located.

column_url
STRING

The column URL.

content_url
STRING

The content URL.

created_at
DATE-TIME

The time the card was created.

creator
OBJECT

Information about the card’s creator.

id
NUMBER

login
STRING

project_cards (table), creator (attribute)

name
STRING

The name of the card.

node_id
STRING

The card’s node ID.

note
STRING

Notes in the card.

project_url
STRING

The project URL.

url
STRING

The card URL.


Replication Method :

Full Table

Primary Key :

id

API endpoint :

listPullRequests

The pull_requests table contains info about pull requests made against the repository.

id
STRING

The pull request ID.

Reference:

updated_at
DATE-TIME

The last time the pull request was updated.

body
STRING

The description of the pull request.

closed_at
STRING

The time the pull request was closed.

created_at
STRING

The time the pull request was created.

merged_at
STRING

The time the pull request was merged.

number
INTEGER

The number of the pull request in the repository.

state
STRING

The current status of the pull request. For example: open

title
STRING

The title of the pull request.

url
STRING

The URL to the pull request.

user
OBJECT

Details about the user who created the pull request.

id
INTEGER

The user ID.

Reference:

login
STRING

The user’s GitHub username.

pull_requests (table), user (attribute)

Replication Method :

Full Table

Primary Key :

id

API endpoint :

listReleases

The releases table contains a list of releases. Note: GitHub doesn’t include regular Git tags that haven’t been associated with a release.

id
STRING

The release ID.

_sdc_repository
STRING

author
OBJECT

Details about the author of the release.

id
INTEGER

The user ID of the author.

login
STRING

The username of the author.

releases (table), author (attribute)

body
STRING

The text describing the tag.

created_at
DATE-TIME

The date the release was created.

draft
BOOLEAN

If TRUE, the release is a draft release.

html_url
STRING

The HTML URL to the release.

name
STRING

The name of the release.

prerelease
BOOLEAN

If TRUE, the release is a pre-release.

published_at
DATE-TIME

The date the release was published.

tag_name
STRING

The name of the tag.

target_commitish
STRING

The commitish value that determines where the Git tag was created.

url
STRING

The URL to the release.


Replication Method :

Key-based Incremental

Replication Key :

updated_at

Primary Key :

id

API endpoint :

List comments on a pull request

The review_comments table contains info about comments made on pull request reviews.

Note: In order to replicate this table, you must also set the pull_requests table to replicate.

id
INTEGER

The review comment ID.

updated_at
DATE-TIME

The time the review comment was last updated.

body
STRING

The body of the review comment.

commit_id
STRING

The ID of the commit the review comment is associated with.

Reference:

created_at
DATE-TIME

The time the review comment was created.

diff_url
STRING

The diff URL associated with the review comment.

html_url
STRING

The HTML URL of the review comment.

in_reply_to_id
INTEGER

If the review comment is a reply to another review comment, this will be the ID of the review comment it is in response to.

issue_url
STRING

The URL of the issue associated with the review comment.

node_id
STRING

The review comment’s node ID.

original_position
INTEGER

The original position of the review comment.

original_commit_id
STRING

The ID of the original comment the review comment is associated with.

Reference:

pull_request_review_id
INTEGER

The ID of the pull request review the comment is a part of.

Reference:

path
STRING

The path of the file the review comment was made on.

position
INTEGER

The position of the review comment.

pull_request_url
STRING

The URL of the pull request associated with the review comment.

url
STRING

The GitHub URL of the review comment.

user
OBJECT

Details about the user who created the review comment.

login
STRING

The login name of the user who created the review comment.

id
STRING

The ID of the user who created the review comment.

review_comments (table), user (attribute)

Replication Method :

Full Table

Primary Key :

id

API endpoint :

listReviewsOnPullRequest

The reviews table contains info about pull request reviews. A pull request review is a group of comments on a pull request.

Note: In order to replicate this table, you must also set the pull_requests table to replicate.

id
INTEGER

The review ID.

body
STRING

The description of the review.

commit_id
STRING

The ID of the commit the review was performed on.

Reference:

html_url
STRING

The HTML URL to the review.

pull_request_url
STRING

The URL to the pull request being reviewed.

state
STRING

The state of the review. Possible values are:

  • APPROVED
  • PENDING
  • CHANGES_REQUESTED

user
OBJECT

Details about the user who submitted the review.

id
INTEGER

The user ID.

Reference:

login
STRING

The user’s GitHub username.

reviews (table), user (attribute)

Replication Method :

Key-based Incremental

Replication Key :

starred_at

Primary Key :

user_id

API endpoint :

listStargazers

The stargazers table contains info about users who have starred a repository.

user_id
INTEGER

The user ID.

starred_at
STRING

The time the user starred the repository.

user
OBJECT

Details about the user who starred the repository.

id
INTEGER

The user ID.

stargazers (table), user (attribute)


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.