ETL Database

Your central database for all things ETL: advice, suggestions, and best practices.

ETL Process

At its most basic, the ETL process encompasses data extraction, transformation, and loading. While the abbreviation implies a neat, three-step process – extract, transform, load – this simple definition doesn’t capture:

It’s best to think of ETL in the context of the business value it provides – ETL is the process by which data is extracted from data sources that are not optimized for analytics, moved to a central host, and optimized for analytics. The exact steps in that process might differ among ETL tools, but the end result is the same.

Traditional ETL process

Historically, the process has looked like this:

Illustration of the historical ETL process

Data is extracted from online transaction processing (OLTP) databases, today more commonly known just as transactional databases, and other data sources. Data is then transformed in a staging area. These transformations cover both data cleansing and optimizing the data for analysis. The transformed data is then loaded into an online analytical processing (OLAP) database, today more commonly known as just an analytics database.

Business intelligence (BI) teams then run queries on that data, which are eventually presented to end users, or to individuals responsible for making business decisions, or used as input for machine learning algorithms. One common problem encountered here is if the OLAP summaries can’t support the type of analysis the BI team wants to do, then the whole process needs to run again, this time with different transformations.

Modern ETL process

Modern technology has changed most organizations’ approach to ETL, for several reasons. The biggest is the advent of powerful analytics warehouses like Amazon Redshift and Google BigQuery. These newer cloud-based analytics databases have the horsepower to perform transformations in place rather than requiring a special staging area.

Also, data today is frequently analyzed in raw form rather than from preloaded OLAP summaries. This has led to the development of lightweight, flexible, and transparent ETL systems with processes that look something like this:

Illustration of the modern ETL process

The biggest advantage to this setup is that transformations and data modeling happen in the analytics database, in SQL. This gives the BI team, data scientists, and analysts greater control over how they work with it, in a common language they all understand.

Try Stitch Today

Critical ETL components

Regardless of the exact ETL process you choose, there are some critical components you’ll want to consider:

Click any of the buttons below for more detail about each step in the ETL process:

Keep Learning about the ETL Process