ETL Database

Your central database for all things ETL: advice, suggestions, and best practices

ETL Extract

During the extract phase of ETL, someone in the organization identifies the desired data sources and the rows, columns, and fields to be extracted from those sources. These sources likely include:

Part of the planning for this stage should include estimating data volumes from each data source. You’ll make significantly different plans for every stage of your ETL process if your data sizes are 100 gigabytes versus 100 petabytes.

You must also extract data in a way that doesn’t have a negative impact on the source systems or response times.

Data Extraction

Data extraction commonly happens in one of three ways:

In the past, the ETL process was largely concerned with extracting data from transactional databases. The prevalence of SaaS products has changed this. Many companies today rely on a host of SaaS tools – Salesforce, Google Analytics, Google Adwords, Facebook Ads, Zendesk, HubSpot, and many more – to run their businesses.

The extraction process for nearly every SaaS product relies on integrating with its APIs. APIs introduce a few challenges to the ETL process:

For example, Facebook’s “move fast and break things” approach to development means the company frequently updates its reporting APIs – and it doesn't notify API users in advance, unless your team invests in building a close-enough relationship with the team building the API to get a “through the grapevine” heads-up.

Keep Learning about ETL Extraction