Stream all your data to your data warehouse.
Select your integrations, choose your warehouse, and enjoy Stitch free for 14 days.
Set up in minutes Unlimited data volume during trial 5 million rows of data free, forever
Your central database for all things ETL: advice, suggestions, and best practices.
The load stage of the ETL process depends largely on what you intend to do with the data once it’s loaded into the data warehouse. Uses could include:
Regardless of your end goal, one of the key considerations during the load process is understanding the work you’re requiring of the target environment. Depending on your data volume, structure, target, and load type, you could negatively impact the host system when you load data.
For example, loading data into Amazon Redshift is best done infrequently in large batches. If you’re loading data into Redshift, you should avoid small, frequent batches or you’ll have angry analysts beating down your door when they notice that your jobs are consuming all of their cluster resources.
Bottom line: The load process needs to be specific to what you’re loading data into. We’re going to move forward with the assumption that you’re loading data into an analytics warehouse.
There are two primary methods to load data into a warehouse:
Full load | Incremental load | |
---|---|---|
Rows sync | All rows in source data | New and updated records only |
Time | More time | Less time |
Difficulty | Low | High. ETL must be checked for new/updated row. Recovery from an issue is harder |
The initial full load is relatively straightforward. When you start taking on incremental loads, things get more complex. Here are three of the most common problem areas:
Any of these problems will likely result in data that is either incomplete or wrong. Recovering from these issues can be a massive headache.
Select your integrations, choose your warehouse, and enjoy Stitch free for 14 days.
Set up in minutes Unlimited data volume during trial 5 million rows of data free, forever