Create a data source using the Stitch Connect API.


Prerequisites

  • Access to Stitch Connect and valid Connect API credentials. Connect access is a Stitch or feature. Refer to the Connect API reference for more info on obtaining API credentials.


Step 1: Get the source's API type

To get started, you’ll need to identify the API type of the data source you want to create. Every data source available in the Connect API has a type, and is typically similar to platform.<source-type>.

For example: The API type for a Recurly source is platform.recurly.

Refer to the Destination and Source API Availability Reference to locate the API type for your data source.


Step 2: Get the source's report card

When preparing for source creation, the next step is to get the report card for the source you want to create. The report card contains information about the steps required to fully configure a source.

Use the GET /v4/source-types/{source_type} endpoint to get the report card for the source. In this example, we’re retrieving the report card for a platform.recurly source:

GET /v4/source-types/{source_type}
curl "https://api.stitchdata.com/v4/source-types/platform.recurly" \
     -H 'Authorization: Bearer [ACCESS_TOKEN]' \
     -H 'Content-Type: application/json'

The response will be a Source object with a Connection step object:

Response for GET /v4/source-types/{source_type}
{
  "type": "platform.recurly",
  "current_step": 1,
  "current_step_type": "form",
  "steps": [
    {
      "type": "form",
      "properties": [
        {
          "name": "anchor_time",
          "is_required": false,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "type": "string",
            "format": "date-time"
          },
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "cron_expression",
          "is_required": false,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": null,
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "frequency_in_minutes",
          "is_required": false,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "type": "string",
            "pattern": "^1$|^30$|^60$|^360$|^720$|^1440$"
          },
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "image_version",
          "is_required": true,
          "is_credential": false,
          "system_provided": true,
          "property_type": "read_only",
          "json_schema": null,
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "start_date",
          "is_required": true,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "type": "string",
            "pattern": "^\\d{4}-\\d{2}-\\d{2}T00:00:00Z$"
          },
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "api_key",
          "is_required": true,
          "is_credential": true,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "type": "string"
          },
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "subdomain",
          "is_required": true,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "type": "string"
          },
          "provided": false,
          "tap_mutable": false
        },
        {
          "name": "quota_limit",
          "is_required": true,
          "is_credential": false,
          "system_provided": false,
          "property_type": "user_provided",
          "json_schema": {
            "anyOf": [
              {
                "type": "integer"
              },
              {
                "type": "string",
                "pattern": "^\\d+"
              }
            ]
          },
          "provided": false,
          "tap_mutable": false
        }
      ]
    },
    {
      "type": "discover_schema",
      "properties": []
    },
    {
      "type": "field_selection",
      "properties": []
    },
    {
      "type": "fully_configured",
      "properties": []
    }
  ],
  "details": {
    "pricing_tier": "standard",
    "pipeline_state": "released",
    "default_start_date": "-1 year",
    "default_scheduling_interval": 60,
    "protocol": "platform.recurly",
    "access": true
  }
}

Note: To create the source in your account, the details.access property must be true. This indicates that the plan your Stitch account is using has access to the source.

For Recurly sources, the following steps are required to fully configure the source:

  1. The form step. Provide values for all required user-provided properties. These properties will have a is_required: true value and a property_type: user_provided value. Refer to the Recurly Source Form Property documentation for more info about these properties.

  2. The discover_schema step. Stitch runs a connection check to test the provided form properties and detects the streams and fields available in the source. If all form properties are valid, including credentials, Stitch will automatically advance to the next step without any action required on your part.

    If the connection check fails, the source will remain on this step until a successful connection check completes.

  3. The field_selection step. Select the streams and fields you want to replicate.


Step 3: Create the source and complete the form step

Use the POST /v4/sources endpoint to create the Recurly source. The request body must include the following properties:

  • type: The API type of the source. In this example, this value will be platform.recurly.
  • display_name: A descriptive name for the source. This will be used to dynamically generate the name corresponding to the schema name or dataset name that the data from this source will be loaded into.

    For example: A display name of Recurly will create a destination schema named recurly. Note: The schema name can’t be changed after the source has been created.

  • properties: A Properties object containing the properties required to configure the source. Refer to the Source form property documentation for your source for more info about the required properties.

    For platform.recurly, the properties are:

    • anchor_time*

    • api_key

    • cron_expression*

    • frequency_in_minutes*

    • quota_limit

    • start_date

    • subdomain

    * While these properties have a is_required: false value, you must provide a replication schedule for the source. Refer to the Replication Scheduling for Sources Using the Connect API guide for more info and examples.

This request will complete the form step outlined in the source’s report card, which you retrieved in Step 2:

POST /v4/sources
curl -X "POST" "https://api.stitchdata.com/v4/sources" \
     -H 'Authorization: Bearer [ACCESS_TOKEN]' \
     -H 'Content-Type: application/json' \
     -d \
'{
   "type":"platform.recurly",
   "display_name":"Recurly",
   "properties":{
      "start_date":"2018-01-10T00:00:00Z",
      "api_key":"[RECURLY_API_KEY]",
      "frequency_in_minutes":"60",
      "quota_limit":"30",
      "subdomain":"stitchdata"
   }
}'

The response will be a Source object containing the source’s ID, report card, and current configuration status (report_card.current_step_type):

Response for POST /v4/sources
'{
   "type":"platform.recurly",
   "display_name":"Recurly",
   "properties":{
      "start_date":"2018-01-10T00:00:00Z",
      "api_key":"[RECURLY_API_KEY]",
      "frequency_in_minutes":"60",
      "quota_limit":"30",
      "subdomain":"stitchdata"
   }
}'

Step 4: Complete the field selection step

Next, you’ll select the streams and fields you want to replicate from the source. The source will automatically progress from discover_schema to field_selection after a successful connection check completes.

Locate the source’s ID in the source’s report card - in this example, it’s 233312 - and follow the steps in the Select Streams and Fields with the Connect API guide.

To complete field selection, at least one stream and one field in the stream must be selected. This includes fields that are automatically selected. For example: If a stream uses an id field as a Primary Key, this field will be automatically included when the stream is selected.

Note: Stream and field selection may occur any time when a source’s current_step is field_selection or fully_configured, as long as the source’s report card has a field_selection step.


Step 5: Check the source's configuration status

After field selection, the source’s configuration status should be fully_configured. When fully_configured, Stitch can begin replication for the source using the schedule and stream/field selection data you provided.

You can verify the source’s configuration status by sending a request to GET /v4/sources/{source_id}, replacing {source_id} with the source’s ID:

GET /v4/sources/{source_id}
curl "https://api.stitchdata.com/v4/sources/233312" \
     -H 'Authorization: Bearer [ACCESS_TOKEN]' \
     -H 'Content-Type: application/json'

The response will be a Source object containing the source’s current configuration status (report_card.current_step_type):

Response for GET /v4/sources/{source_id}
{
  "properties": {
    "frequency_in_minutes": "60",
    "image_version": "1.latest",
    "quota_limit": "30",
    "start_date": "2018-01-10T00:00:00Z",
    "subdomain": "stitchdata"
  },
  "updated_at": "2020-03-20T19:09:03Z",
  "schedule": {
    "type": "interval",
    "unit": "minute",
    "interval": 60.0,
    "next_fire_time": "2020-03-20T19:09:38Z"
  },
  "name": "recurly",
  "type": "platform.recurly",
  "deleted_at": null,
  "system_paused_at": null,
  "stitch_client_id": 116078,
  "paused_at": null,
  "id": 233312,
  "display_name": "Recurly",
  "created_at": "2020-03-20T16:03:16Z",
  "report_card": {
    "type": "platform.recurly",
    "current_step": 4,
    "current_step_type": "fully_configured",       /* Configuration status */
    "steps": [
      {
        "type": "form",
        "properties": [
          {
            "name": "anchor_time",
            "is_required": false,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "type": "string",
              "format": "date-time"
            },
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "cron_expression",
            "is_required": false,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": null,
            "provided": false,
            "tap_mutable": false
          },
          {
            "name": "frequency_in_minutes",
            "is_required": false,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "type": "string",
              "pattern": "^1$|^30$|^60$|^360$|^720$|^1440$"
            },
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "image_version",
            "is_required": true,
            "is_credential": false,
            "system_provided": true,
            "property_type": "read_only",
            "json_schema": null,
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "start_date",
            "is_required": true,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "type": "string",
              "pattern": "^\\d{4}-\\d{2}-\\d{2}T00:00:00Z$"
            },
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "api_key",
            "is_required": true,
            "is_credential": true,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "type": "string"
            },
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "subdomain",
            "is_required": true,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "type": "string"
            },
            "provided": true,
            "tap_mutable": false
          },
          {
            "name": "quota_limit",
            "is_required": true,
            "is_credential": false,
            "system_provided": false,
            "property_type": "user_provided",
            "json_schema": {
              "anyOf": [
                {
                  "type": "integer"
                },
                {
                  "type": "string",
                  "pattern": "^\\d+"
                }
              ]
            },
            "provided": true,
            "tap_mutable": false
          }
        ]
      },
      {
        "type": "discover_schema",
        "properties": []
      },
      {
        "type": "field_selection",
        "properties": []
      },
      {
        "type": "fully_configured",
        "properties": []
      }
    ]
  }
}

Step 6: Start a replication job

Now that the source is fully_configured, you can start extracting data.

Stitch will automatically schedule a replication job based on the schedule you set. To see when the next replication job will begin, check the schedule.next_fire_time value in the Source object:

Example schedule in a Source object
{
   "properties": {...},
   "updated_at": "2020-03-20T19:09:03Z",
   "schedule":{
      "type":"interval",
      "unit":"minute",
      "interval":60.0,
      "next_fire_time":"2020-03-20T19:09:38Z"
   },
   [...]
}

If you want to start a replication job sooner than the next_fire_time, you can send a request to POST /v4/sources/{source_id}/sync, replacing {source_id} with the source’s ID:

POST /v4/sources/{source_id}/sync
curl -X "POST" "https://api.stitchdata.com/v4/sources/233312/sync" \
     -H 'Authorization: Bearer [ACCESS_TOKEN]' \
     -H 'Content-Type: application/json'

Note: Stitch allows only one replication job to run at a time. The response to the above request may be either of the following:

  • If the job started successfully, the response will be a single Replication Job object:

    Response for POST /v4/sources/{source_id}/sync indicating a job was started
    {
     "job_name":"116078.233312.sync.c12fb0a7-7e4a-11e9-abdc-0edc2c318fba"
    }
    
  • If a job was already in progress, the response will be a single error object similar to:

    Response for POST /v4/sources/{source_id}/sync indicating a job is already running
    {
     "error":{
        "type":"already_running",
        "message":"Did not create job for client-id: 116078; connection-id: 233312 because one already exists"
     }
    }
    

Next steps

Congratulations on configuring a data source using the Connect API! Check out the Tutorials and resources to see what else you can do with Stitch Connect.