Microsoft Azure SQL Data Warehouse is a fast, fully-managed, petabyte-scale data warehouse. It’s ideal for batch-based data warehouse workloads, and designed with a decoupled storage and compute model that allows it to scale quickly and be maintained in a cost-effective way.

This guide serves as a reference for version 1 of Stitch’s Microsoft Azure SQL Data Warehouse destination.


Details and features

Stitch features

High-level details about Stitch’s implementation of Microsoft Azure SQL Data Warehouse, such as supported connection methods, availability on Stitch plans, etc.

Release status

Released

Stitch plan availability

All Stitch plans

Supported versions

Not applicable

Connect API availability Supported

This version of the Microsoft Azure SQL Data Warehouse destination can be created and managed using Stitch’s Connect API. Learn more.

SSH connections Supported

Stitch supports using SSH tunnels to connect to Microsoft Azure SQL Data Warehouse destinations.

SSL connections Supported

Stitch will attempt to use SSL to connect by default. No additional configuration is needed.

VPN connections Unsupported

Virtual Private Network (VPN) connections may be implemented as part of an Enterprise plan. Contact Stitch Sales for more info.

Default loading behavior

Upsert
Note: Append-Only loading will be used if all conditions for Upsert are not met. Learn more.

Nested structure support

Unsupported
Nested data structures will be flattened into relational objects. Learn more.

Destination details

Details about the destination, including object names, table and column limits, reserved keywords, etc.

Note: Exceeding the limits noted below will result in loading errors or rejected data.

Maximum record size

1MB

This limit is imposed by PolyBase, which is required to load data into Microsoft Azure SQL Data Warehouse destinations.

Table name length

112 characters

Column name length

128 characters

Maximum table size

60 TB compressed on disk

Maximum tables per database

10,000

Case sensitivity

Insensitive

Reserved keywords

Refer to the Reserved keywords documentation.

Microsoft Azure SQL Data Warehouse pricing

Microsoft Azure SQL Data Warehouse bases their pricing on your compute and storage usage. Compute usage is charged using an hourly rate, meaning you’ll only be billed for the hours your data warehouse is active. Compute usage is billed in one hour increments.

Storage charges include the size of your primary database and seven days of incremental snapshots. Microsoft Azure rounds charges to the nearest terabyte (TB). For example: If the data warehouse is 1.5 TB and you have 100 GB of snapshots, your bill will be for 2 TB of data.

Refer to Microsoft’s documentation for more info and examples.


Replication

Replication process overview

Step 1: Data extraction

Stitch requests and extracts data from a data source. Refer to the System overview guide for a more detailed explanation of the Extraction phase.

Step 2: Stitch's internal pipeline

The data extracted from sources is processed by Stitch. Stitch’s internal pipeline includes the Prepare and Load phases of the replication process:

  • Prepare: During this phase, the extracted data is buffered in Stitch’s durable, highly available internal data pipeline and readied for loading.
  • Load: During this phase, the prepared data is transformed to be compatible with the destination, and then loaded. Refer to the Transformations section for more info about the transformations Stitch performs for Microsoft Azure SQL Data Warehouse destinations.

Refer to the System overview guide for a more detailed explanation of these phases.

Step 3: Load data into Azure Blob Storage

Stitch loads the extracted data into Azure Blob Storage.

Blob storage is intended for storing massive amounts of unstructured data. In the next step, Stitch will use Polybase to retrieve the data from Blob Storage and prepare it for loading into Microsoft Azure SQL Data Warehouse.

Step 4: Prep data using Polybase

Polybase is a Microsoft offering that integrates Microsoft SQL products with Hadoop. Polybase is needed to query data from Azure Blob Storage.

In this step, Stitch will perform the following:

  1. Create an external data source. This creates an external data source for the Polybase queries Stitch will run.
  2. Create an external file format. This creates an object that defines the external (extracted) data Stitch will load. This is used in the next step to create an external table.
  3. Create an external table. Using the external file format, this will create an external table. The external table is used to stage the data from Azure blob storage and load it into your Microsoft Azure SQL Data Warehouse data warehouse.

Step 5: Insert data into the data warehouse

Lastly, Stitch will insert the data from the external table in Polybase into your Microsoft Azure SQL Data Warehouse data warehouse.

Loading behavior

By default, Stitch will use Upsert loading when loading data into Microsoft Azure SQL Data Warehouse.

If the conditions for Upsert loading aren’t met, data will be loaded using Append-Only loading.

Refer to the Understanding loading behavior guide for more info and examples.

Primary Keys

Stitch requires Primary Keys to de-dupe incrementally replicated data. To ensure Primary Key data is available, Stitch creates an _sdc_primary_keystable in every integration dataset. This table contains a list of all tables in an integration’s dataset and the columns those tables use as Primary Keys.

Refer to the _sdc_primary_keys table documentation for more info.

Note: Removing or altering this table can lead to replication issues.

Incompatible sources

No compatibility issues have been discovered between Microsoft Azure SQL Data Warehouse and Stitch's integration offerings.

See all destination and integration incompatibilities.


Transformations

System tables and columns

Stitch will create the following tables in each integration’s dataset:

Additionally, Stitch will insert system columns (prepended with _sdc) into each table.

Data typing

Stitch converts data types only where needed to ensure the data is accepted by Microsoft Azure SQL Data Warehouse. In the table below are the data types Stitch supports for Microsoft Azure SQL Data Warehouse destinations, and the Stitch types they map to.

  • Stitch type: The Stitch data type the source type was mapped to. During the Extraction and Preparing phases, Stitch identifies the data type in the source and then maps it to a common Stitch data type.
  • Destination type: The destination-compatible data type the Stitch type maps to. This is the data type Stitch will use to store data in Microsoft Azure SQL Data Warehouse.
  • Notes: Details about the data type and/or its allowed values in the destination, if available. If a range is available, values that exceed the noted range will be rejected by Microsoft Azure SQL Data Warehouse.
Stitch type Destination type Notes
BIGINT BIGINT
  • Range : -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)

BOOLEAN BOOLEAN
  • Description: Values are stored as 1 (true), 0 (false), or NULL.

DATE DATETIMEOFFSET
  • Description: Stored in UTC as DATETIMEOFFSET(3)

  • Range : Dates: 0001-01-01/January 1, 0001 AD, through 9999-12-31/December 31, 9999 AD; Time: 00:00:00 through 23:59:59.997

DOUBLE FLOAT
  • Description: Stored as FLOAT(53)

FLOAT FLOAT
  • Description: Stored as FLOAT(53)

INTEGER BIGINT
  • Range : -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)

NUMBER DECIMAL
  • Description: Precision must be between 1 and 38; scale must be between 0 and the precision value

  • Range : Precision must be between 1 and 38; scale must be between 0 and the precision value

STRING NVARCHAR
  • Description: Columns will be automatically widened if the string length exceeds the current size. Note: String values that exceed the maximum width will be truncated.

  • Range : Maximum width is 4,000 bytes

JSON structures

Microsoft Azure SQL Data Warehouse destinations don’t have native support for nested data structures. To ensure nested data can be loaded, Stitch will flatten objects and arrays into columns and subtables, respectively. For more info and examples, refer to the Handling nested data structures guide.

Column names

Column names in Microsoft Azure SQL Data Warehouse:

Stitch will perform the following transformations to ensure column names adhere to the rules imposed by Microsoft Azure SQL Data Warehouse:

Transformation Source column Destination column
Convert spaces to underscores customer id customer_id
Convert special characters to underscores customer#id or !CuStOMerID customer_id and _CuStOMerID
Convert leading numbers to underscores 4customerid _customerid

Timezones

Microsoft Azure SQL Data Warehouse will store the value in UTC with the specified offset.


Compare destinations

Not sure if Microsoft Azure SQL Data Warehouse is the destinationfor you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.


Questions? Feedback?

Did this article help? If you have questions or feedback, feel free to submit a pull request with your suggestions, open an issue on GitHub, or reach out to us.