The availability of cloud data warehouses that store and process data cost-effectively is changing the way companies manage their analytical data. The shift from on-premises servers toward cloud data warehouses is sparking a shift from ETL to ELT.
What is ELT?
ELT stands for “extract, load, and transform” — the processes a data pipeline uses to replicate data from a source system into a target system such as a cloud data warehouse.
- Extraction: This first step involves copying data from the source system.
- Loading: During the loading step, the pipeline replicates data from the source into the target system, which might be a data warehouse or data lake.
- Transformation: Once the data is in the target system, organizations can run whatever transformations they need. Often organizations will transform raw data in different ways for use with different tools or business processes.
ELT — the next generation of ETL
ELT is a modern variation on the older process of extract, transform, and load (ETL), in which transformations take place before the data is loaded. Running transformations before the load phase results in a more complex data replication process.
ETL tools require processing engines for running transformations prior to loading data into a destination. With ELT, on the other hand, businesses use the processing engines in the destinations to efficiently transform data within the target system itself. This removal of an intermediate step streamlines the data loading process.
Because ETL transforms data prior to the loading stage, it’s the ideal process when a destination requires a specific data format. This could include when there’s a misalignment in supported data types between the source and destination, limited ability to quickly scale processing in a destination, or security restrictions make it impossible to store raw data in a destination.
However, when the destination is a cloud-native data warehouse like Amazon Redshift, Google BigQuery, Snowflake, and Microsoft Azure SQL Data Warehouse, ELT is a better approach. Organizations can transform their raw data at any time, when and as necessary for their use case, and not as a step in the data pipeline.
Business benefits of ELT
ELT and cloud-based data warehouses and data lakes have several benefits over ETL and on-premises hardware.
Time to value
ELT generally provides faster time to value, which means business intelligence is available more quickly. By contrast, ETL requires a time-intensive and resource-heavy transformation step prior to loading or integrating data.
ELT tools are used in conjunction with cloud data warehouses that are designed to autoscale in the face of increased processing loads. Cloud platforms allow for almost unlimited scale within seconds or minutes, whereas older generations of on-premises data warehouses require organizations to order, install, and configure new hardware.
Many BI tools determine schema on read and produce transformations on demand, so you can replicate raw data into your data warehouse or data lake and transform it when and however you need to.
ELT in the real world — “Liv-ing up” to data integration expectations
The Brazilian food tech startup Liv Up provides an example of the effectiveness of ELT in data warehouses. The company integrates data from a variety of sources — including MongoDB, Google Analytics, and Zendesk — into its data warehouse. In the past, the process was effective but cumbersome.
Data from MongoDB was especially challenging because it required translating NoSQL data into a relational data structure. With traditional ETL, it took the company about a month to write the code for their data pipeline. Liv Up needed to reduce the time to value and get their data to its destination more quickly.
Liv Up turned to Stitch to simplify and expedite the data replication process. Stitch reduced the time it took to extract and load the company’s data by 8 hours a week. Better yet, Liv Up benefited from the ability to build their own transformation phase to easily leverage BI tools inherent to their company.
Stitch — The fastest way to get your data to its destination
Stitch makes it easy to extract data from more than 90 sources and move it to your target destination. Sign up for a free trial and get your data to its destination in minutes.