To ensure compatibility and that the fields Stitch requires for replication are included in selected streams, Stitch enforces field selection and compatibility rules. Learn about the metadata types that control field inclusion in the Connect API.


Field types

Stitch requires two types of fields for stream replication: Primary Keys and, when applicable, Replication Keys.

Primary Key fields

To accurately replicate data for a stream, Stitch requires the Primary Key information for each stream. A Primary Key is a column or set of columns that uniquely define a record.

Depending on the source and stream type, this is handled one of several ways.

Database sources

For database sources, Stitch will typically query the database’s information schema to determine the Primary Key fields and then store the list of Primary Key field names as a list in the stream’s metadata table-key-properties property:

Primary Keys in a database stream
{
  "selected": null,
  "stream_id": 2289176,
  "tap_stream_id": "demni2mf59dt10-heroku-orders",
  "stream_name": "orders",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": null,
    "replication-method": null,
    "is-view": false,
    "row-count": 447,
    "schema-name": "heroku",
    "table-key-properties": [
      "id"
    ]
  }
}

Database views

For database views, the stream’s metadata will contain an is-view property with a value of true:

Primary Keys in a database view (stream)
{
  "selected": true,
  "stream_id": 2375830,
  "tap_stream_id": "demni2mf59dt10-public-customer_view",
  "stream_name": "customer_view",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": true,
    "is-view": true,
    "replication-key": "updated_at",
    "replication-method": "updated_at",
    "row-count": 56,
    "schema-name": "public",
    "table-key-properties": [],
    "view-key-properties": [
      "id"
    ]
  }
}

Primary Key information must be provided in the view-key-properties metadata property when the stream is selected for replication.

SaaS sources

For SaaS sources, Primary Keys are typically hard-coded in the Singer tap backing the source. The list of Primary Key field names will be stored as a list in the stream’s metadata table-key-properties property:

Primary Keys in a SaaS stream
{
  "selected": null,
  "stream_id": 2288758,
  "tap_stream_id": "custom_collections",
  "stream_name": "custom_collections",
  "metadata": {
    "forced-replication-method": "INCREMENTAL",
    "selected": null,
    "table-key-properties": [
      "id"
    ],
    "valid-replication-keys": [
      "updated_at"
    ]
  }
}

Replication Key fields

If a stream’s replication-method is INCREMENTAL, an appropriate field must be set as the stream’s Replication Key. Replication Keys are columns used to identify new and updated data for replication. These are typically integer, datetime, or timestamp columns and are required to use Key-based Incremental Replication.

Like Primary Keys, this is handled in one of several ways depending on the source type.

Database sources

For database sources, a valid Replication Key must be provided using the replication-key metadata property when the stream is selected.

Replication Keys in a database stream
{
  "selected": null,
  "stream_id": 2289176,
  "tap_stream_id": "demni2mf59dt10-heroku-orders",
  "stream_name": "orders",
  "metadata": {
    "database-name": "demni2mf59dt10",
    "selected": null,
    "replication-method": null,
    "is-view": false,
    "row-count": 447,
    "schema-name": "heroku",
    "table-key-properties": [
      "id"
    ]
  }
}

Note: This is also applicable to database views if the stream’s replication-method is set to INCREMENTAL.

SaaS sources

For SaaS sources, Replication Keys are hard-coded in the Singer tap backing the source. The list of Replication Key field names will be stored as a list in the stream’s metadata valid-replication-keys property:

Replication Keys in a SaaS stream
{
  "selected": null,
  "stream_id": 2288758,
  "tap_stream_id": "custom_collections",
  "stream_name": "custom_collections",
  "metadata": {
    "forced-replication-method": "INCREMENTAL",
    "selected": null,
    "table-key-properties": [
      "id"
    ],
    "valid-replication-keys": [
      "updated_at"
    ]
  }
}

Note: When selecting fields in SaaS streams with a valid-replication-keys property, you must explicitly set the stream’s replication-key to a field in the valid-replication-keys property. Selecting this field for replication won’t automatically set the field as the stream’s Replication Key.


Field selection rules

Stitch requires Primary Key and Replication Key fields in streams to be selected in order to successfully and accurately replicate data.

To ensure the required fields are included in a stream’s field inclusion list, Stitch enforces field selection rules.

Metadata in field selection

Field selection rules are shaped by three metadata fields in a Field-level Metadata object:

inclusion
STRING
READ-ONLY

Indicates when a field will be included. Possible values are:

  • automatic - The field is included all the time, regardless of selected-by-default and selected values
  • available - The field is available for selection. The field will be included if selected-by-default or selected is true.
  • unsupported - The field is unsupported and will not be included, regardless of selected-by-default and selected values
selected-by-default
BOOLEAN
READ-ONLY

Indicates if a field will be selected by default. Possible values are:

  • null - The value has not been set
  • true - The field is selected by default and is included regardless of the selected value
  • false - The field is not selected by default. The field will be included if the selected value is true.
selected
BOOLEAN

Indicates whether a field should be selected. Possible values are:

  • null - The value has not been set
  • true - The field is selected
  • false - The field is not selected

Field selection metadata combinations

Below are the possible combinations of metadata field values and whether a field will be selected with the listed settings.

Note: A * in the table indicates any possible value (null, true, or false) for the metadata field.

inclusion selected selected-by-default replicated?
automatic * *
unsupported * *
available true null
available true true
available true false
available false null
available false true
available false false
available null true
available null false
available null null

Field compatibility rules

While all fields are subject to field selection rules, some fields are also subject to field compatibility rules. This means that certain combinations of fields are not able to be selected together in a single stream.

These restrictions primarily affect SaaS sources like Microsoft Advertising (formerly Bing Ads), Google Analytics, or Google AdWords, and are set by the source.

Field exclusion metadata

If a field is subject to compatibility rules, its Field-level Metadata object will contain a fieldExclusion property. This property contains a list of arrays that correspond to the breadcrumb of an incompatible field.

For example: Below is the field-level metadata for the DeviceOS field in the Microsoft Advertising (formerly Bing Ads) ad_group_performance_report stream:

Example field-level metadata for a Microsoft Advertising field
{
  "metadata": {
    "fieldExclusions": [
      [
        "properties",
        "ExactMatchImpressionSharePercent"
      ],
      [
        "properties",
        "ImpressionLostToAdRelevancePercent"
      ],
      [
        "properties",
        "ImpressionLostToBidPercent"
      ],
      [
        "properties",
        "ImpressionLostToBudgetPercent"
      ],
      [
        "properties",
        "ImpressionLostToExpectedCtrPercent"
      ],
      [
        "properties",
        "ImpressionLostToRankPercent"
      ],
      [
        "properties",
        "ImpressionSharePercent"
      ]
    ],
    "inclusion": "available"
  }
}

This indicates that when the DeviceOS field is selected, the fields listed in the fieldExclusions property cannot also be selected.

Google Analytics field compatibility

Google Analytics sources are the exception to the previous section. Fields in this source are still subject to compatibility rules, but field-level metadata won’t contain a fieldExclusion property.

To determine what fields are compatible, we recommend using Google’s Dimensions and Metrics Explorer before sending field selection requests to the API.

Field exclusion violations

The Connect API may allow you to select fields that violate field exclusion/compatibility rules, but doing so will likely result in extraction job failures.

To avoid this scenario, Stitch recommends considering fieldExclusions, if available, when building your own application. For Google Analytics sources, we recommend using Google’s Dimensions and Metrics Explorer to determine field compatibility.