Amazon Redshift debuted in 2012 as the first cloud data warehouse, and remains the most popular one today. But Redshift wasn't developed in-house. Amazon invested $20 million in a company called ParAccel, and in return gained the license to use code from ParAccel Analytic Database (PADB) for Redshift. PADB was notable because it was a columnar database that ran on commodity hardware, which made it a natural choice as a basis for a cloud-based analytic database platform.
While that deal might have seemed worthwhile for ParAccel at the time, it worked out even better for Amazon in the long run. In 2015 Amazon CTO Werner Vogels called Redshift "the fastest-growing service in AWS, ever." Meanwhile, ParAccel was acquired by Actian in 2013, and PADB was renamed Actian Matrix.
PADB was itself based on PostgreSQL – so to some extent, Redshift is based on PostgreSQL – but "based on" leaves a lot of room for difference. If you're familiar with PostgreSQL features and syntax, how easy will it be to get used to Redshift?
First, there are architectural differences between Redshift and PostgreSQL:
The architectural changes Amazon made to Redshift make it better able to handle large volumes of data for analytical queries. PostgreSQL can serve as a data warehouse for smaller volumes of data, but it can't match the performance of Redshift's column-oriented architecture. But the good news is that if you're familiar with PostgreSQL commands and concepts, learning Redshift isn't like learning a foreign language. It's more like the difference between dialects – the American and British versions of English, for example.
Both databases use SQL as their native language. While a lot of the two platforms' SQL syntax is the same, there are plenty of differences as well. The list of Redshift SQL commands differs from the list of PostgreSQL commands, and even when both platforms implement the same command, their syntax is often different.
Per Amazon's documentation, here are some of the major differences between Redshift and PostgreSQL SQL commands:
CREATE TABLE Redshift doesn't support tablespaces, table partitioning, inheritance, and certain constraints. The Redshift implementation of CREATE TABLE enables you to define the sort and distribution algorithms for tables to optimize parallel processing.
ALTER TABLE Redshift doesn't support ALTER COLUMN actions, and ADD COLUMN supports adding only one column in each ALTER TABLE statement.
COPY The Redshift COPY command is specialized to enable loading of data from Amazon S3 buckets and Amazon DynamoDB tables and to facilitate automatic compression.
INSERT, UPDATE, and DELETE Redshift doesn't support the WITH clause.
VACUUM The parameters for VACUUM are different between the two databases. For example, the default VACUUM operation in PostgreSQL reclaims space and makes it available for reuse. The default VACUUM operation in Redshift is VACUUM FULL, which reclaims disk space and resorts all rows.
Amazon lists many PostgreSQL features and functions that are not supported in Redshift. In addition to the aforementioned lack of support for constraints and indexes, the list of unsupported features includes stored procedures, triggers, value expressions, and inheritance.
Redshift lacks support for several PostgreSQL data types:
Redshift does support these data types:
|SMALLINT||INT2||Signed two-byte integer|
|INTEGER||INT, INT4||Signed four-byte integer|
|BIGINT||INT8||Signed eight-byte integer|
|DECIMAL||NUMERIC||Exact numeric of selectable precision|
|REAL||FLOAT4||Single precision floating-point number|
|DOUBLE PRECISION||FLOAT8, FLOAT||Double precision floating-point number|
|BOOLEAN||BOOL||Logical Boolean (true/false)|
|CHAR||CHARACTER, NCHAR, BPCHAR||Fixed-length character string|
|VARCHAR||CHARACTER VARYING, NVARCHAR, TEXT||Variable-length character string with a user-defined limit|
|DATE||Calendar date (year, month, day)|
|TIMESTAMP||TIMESTAMP WITHOUT TIME ZONE||Date and time (without time zone)|
|TIMESTAMPTZ||TIMESTAMP WITH TIME ZONE||Date and time (with time zone)|
If you have SQL skills you developed from working with PostgreSQL, you'll be able to get by in Amazon Redshift pretty well – but you'll have to familiarize yourself with the differences between the two platforms. Redshift is optimized for analytical queries, and its commands and syntax reflect that. As long as you keep a command reference page bookmarked, you should be fine.