An open-source relational database, PostgreSQL is a powerful and well-known system that has received recognition from both its users and the industry at large. Unlike some other database systems, PostgreSQL is completely customizable and yours to do with as you please (assuming, of course, that your instance is self-hosted).
For a more in-depth look at PostgreSQL, click here.
In this guide, we'll walk you through:
- PostgreSQL's pricing model,
- Some high-level limitations (including any incompatible data sources),
- How to spin up a PostgreSQL data warehouse of your own, and
- How Stitch loads and organizes data in PostgreSQL.
Pricing for PostgreSQL depends entirely on where your instance is hosted.
- Self-hosted: PostgreSQL is open-source, meaning you don’t need to pay an upfront cost to obtain the necessary software. You may, however, have hosting and maintenance costs associated with the server housing the instance. You may have to do a little bit of internal number crunching to figure out these potential costs.
- Heroku: Heroku has a variety of plans to choose from, and a guide to help you select the right plan for you or your company.
- Amazon RDS: Amazon offers a variety of plans for both on-demand instances and Multi-AZ Deployment. To get an estimate of what your monthly bill might look like, check out their monthly calculator.
Google CloudSQL PostgreSQL: Unlike many other cloud-based data warehouse solutions, Google’s pricing model is based on usage and not a fixed-rate. This means that your bill can vary over time.
Before fully committing yourself to using Google CloudSQL PostgreSQL as your data warehouse, we recommend familiarizing yourself with Google’s pricing model and using their pricing calculator to estimate your potential costs.
Every database has its own supported limits and way of handling data, and PostgreSQL is no different. The table below provides a very high-level look at what PostgreSQL supports, including any possible incompatibilities with Stitch’s integration offerings.
|Incompatibile Sources||Possible incompatibilities. Learn more.|
PostgreSQL’s full list; Stitch reserves
|Table Name Length||63 characters||Column Name Length||59 characters|
|Max # of Columns||250-1,600||VARCHAR Max Width||None|
|Nested Structure Support||None; structures will be de-nested. Learn more.||Case||Case Sensitive|
Not sure if PostgreSQL is the data warehouse for you? Check out the Choosing a Stitch Destination guide to compare each of Stitch’s destination offerings.
When it comes to setting up, hosting, and connecting a PostgreSQL data warehouse to Stitch, you have a few options:
- Connect a self-hosted PostgreSQL database. If you currently use PostgreSQL, you can create a database for Stitch within the instance and connect it.
- Connect a Heroku-PostgreSQL database. Heroku offers a managed PostgreSQL-based database service, which is easy to set up and connect to Stitch. No technical expertise is required to create the database or connect it to Stitch.
- Connect an Amazon PostgreSQL-RDS database. Amazon Web Services’ RDS service allows you to create and connect a PostgreSQL database to Stitch.
- Connect a Google CloudSQL PostgreSQL database. Google’s fully-managed CloudSQL database service takes only minutes to set up and provision. While no technical expertise is required to set up a simple instance, additional configuration options - such as assigning specific database permissions - is available for more advanced users.
After you’ve successfully connected your PostgreSQL data warehouse to Stitch, you can start adding integrations and replicating data.
For each integration that you add to Stitch, a schema specific to that integration will be created in your data warehouse. This is where all the tables for that inegration will be stored.
Stitch will encounter dozens of scenarios when replicating and loading your data. To learn more about how PostgreSQL handles these scenarios, check out the Data Loading Guide for PostgreSQL.
Rejected Records Log
Occasionally, Stitch will encounter data that it can’t load into the data warehouse. For example: a table contains more columns than PostgreSQL’s allowed limit of 250-1,600 columns per table.
When this happens, the data will be “rejected” and logged in a table called
_sdc_rejected. Every integration schema created by Stitch will include this table as well as the other tables in the integration.
Stitch replicates data from your sources based on the integration’s Replication Frequency and the Replication Method used by the tables in the integration. In Stitch, you have the ability to control what and how often data is replicated for the majority of integrations.
The time from the sync start to data being loaded into your data warehouse can vary depending on a number of factors, especially for initial historical loads.
To learn more about Stitch’s replication process and how loading time can be affected, check out the Stitch Replication Process guide.
Did this article help? If you have questions or feedback, please reach out to us.