Stitch only supports connecting to Azure Synapse Analytics instances
Stitch’s Microsoft Azure Synapse Analytics destination only works with Microsoft’s Microsoft Azure Synapse Analytics.
Stitch doesn’t currently support using Azure SQL Server or Azure SQL Database as a destination.
Microsoft Azure Synapse Analytics is a fast, fully-managed, petabyte-scale data warehouse. It’s ideal for batch-based data warehouse workloads, and designed with a decoupled storage and compute model that allows it to scale quickly and be maintained in a cost-effective way.
This guide serves as a reference for version 1 of Stitch’s Microsoft Azure Synapse Analytics destination.
Details and features
High-level details about Stitch’s implementation of Microsoft Azure Synapse Analytics, such as supported connection methods, availability on Stitch plans, etc.
|Stitch plan availability||
All Stitch plans
|Stitch supported regions||
Operating regions determine the location of the resources Stitch uses to process your data. Learn more.
|Connect API availability||
This version of the Microsoft Azure Synapse Analytics destination can be created and managed using Stitch’s Connect API. Learn more.
Stitch supports using SSH tunnels to connect to Microsoft Azure Synapse Analytics destinations.
Stitch will attempt to use SSL to connect by default. No additional configuration is needed.
Virtual Private Network (VPN) connections may be implemented as part of a Premium plan. Contact Stitch Sales for more info.
|Static IP addresses||
This version of the Microsoft Azure Synapse Analytics destination has static IP addresses that can be whitelisted.
|Default loading behavior||
|Nested structure support||
Details about the destination, including object names, table and column limits, reserved keywords, etc.
Note: Exceeding the limits noted below will result in loading errors or rejected data.
|Maximum record size||
This limit is imposed by PolyBase, which is required to load data into Microsoft Azure Synapse Analytics destinations.
|Table name length||
|Column name length||
|Maximum columns per table||
|Maximum table size||
60 TB compressed on disk
|Maximum tables per database||
Refer to the Reserved keywords documentation.
Microsoft Azure Synapse Analytics pricing
Microsoft Azure Synapse Analytics bases their pricing on your compute and storage usage. Compute usage is charged using an hourly rate, meaning you’ll only be billed for the hours your data warehouse is active. Compute usage is billed in one hour increments.
Storage charges include the size of your primary database and seven days of incremental snapshots. Microsoft Azure rounds charges to the nearest terabyte (TB). For example: If the data warehouse is 1.5 TB and you have 100 GB of snapshots, your bill will be for 2 TB of data.
Refer to Microsoft’s documentation for more info and examples.
Replication process overview
Step 1: Data extraction
Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.
Step 2: Stitch's internal pipeline
The data extracted from sources is processed by Stitch. Stitch’s internal pipeline includes the Prepare and Load phases of the replication process:
- Prepare: During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading.
- Load: During this phase, the prepared data is transformed to be compatible with the destination, and then loaded. Refer to the Transformations section for more info about the transformations Stitch performs for Microsoft Azure Synapse Analytics destinations.
Refer to the System overview guide for a more detailed explanation of these phases.
Step 3: Load data into Azure Blob Storage
Stitch loads the extracted data into Azure Blob Storage.
Blob storage is intended for storing massive amounts of unstructured data. In the next step, Stitch will use Polybase to retrieve the data from Blob Storage and prepare it for loading into Microsoft Azure Synapse Analytics.
Step 4: Prep data using Polybase
Polybase is a Microsoft offering that integrates Microsoft SQL products with Hadoop. Polybase is needed to query data from Azure Blob Storage.
In this step, Stitch will perform the following:
- Create an external data source. This creates an external data source for the Polybase queries Stitch will run.
- Create an external file format. This creates an object that defines the external (extracted) data Stitch will load. This is used in the next step to create an external table.
- Create an external table. Using the external file format, this will create an external table. The external table is used to stage the data from Azure blob storage and load it into your Microsoft Azure Synapse Analytics data warehouse.
Step 5: Insert data into the data warehouse
Lastly, Stitch will insert the data from the external table in Polybase into your Microsoft Azure Synapse Analytics data warehouse.
By default, Stitch will use Upsert loading when loading data into Microsoft Azure Synapse Analytics.
If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.
Refer to the Understanding loading behavior guide for more info and examples.
Stitch requires Primary Keys to de-dupe incrementally replicated data. To ensure Primary Key data is available, Stitch creates an
_sdc_primary_keystable in every integration dataset. This table contains a list of all tables in an integration’s dataset and the columns those tables use as Primary Keys.
Refer to the _sdc_primary_keys table documentation for more info.
Note: Removing or altering this table can lead to replication issues.
No compatibility issues have been discovered between Microsoft Azure Synapse Analytics and Stitch's integration offerings.
System tables and columns
Stitch will create the following tables in each integration’s dataset:
Additionally, Stitch will insert system columns (prepended with
_sdc) into each table.
Stitch converts data types only where needed to ensure the data is accepted by Microsoft Azure Synapse Analytics. In the table below are the data types Stitch supports for Microsoft Azure Synapse Analytics destinations, and the Stitch types they map to.
- Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
- Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Microsoft Azure Synapse Analytics.
- Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Microsoft Azure Synapse Analytics.
|Stitch type||Destination type||Notes|
Microsoft Azure Synapse Analytics destinations don’t have native support for nested data structures. To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.
Column names in Microsoft Azure Synapse Analytics:
Must be less than the maximum length of 128 characters. Columns that exceed this limit will be rejected by Microsoft Azure Synapse Analytics.
Must not be prefixed or suffixed with any of Microsoft Azure Synapse Analytics’s or Stitch’s reserved keyword prefixes or suffixes
Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Microsoft Azure Synapse Analytics:
|Transformation||Source column||Destination column|
|Convert spaces to underscores||
|Convert special characters to underscores||
|Convert leading numbers to underscores||
Microsoft Azure Synapse Analytics will store the value in UTC with the specified offset.
Not sure if Microsoft Azure Synapse Analytics is the destinationfor you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.