ETL Database

Your central database for all things ETL: advice, suggestions, and best practices

ETL Transform

Transformation refers to the cleansing and aggregation that may need to happen to data to prepare it for analysis. Architecturally speaking, there are two ways to approach ETL transformation:

  • Multistage data transformation – This is the classic extract, transform, load process. Extracted data is moved to a staging area where transformations occur prior to loading the data into the warehouse.
  • In-warehouse data transformation – In this approach, the process flow changes to something more like ELT. Data is extracted and loaded into the analytics warehouse, and transformations are done there.

In recent years, there has been a shift toward transforming data within the warehouse rather than transforming it beforehand, primarily driven by two factors:

  1. The increased performance and scalability of the modern analytics database
  2. The ability for in-database transformations to be written in SQL, the data manipulation language of choice for most analysts

Today, this approach has enough advantages that it has become the default; when projects decide to use the traditional model, this decision should come with real consideration of the costs and benefits.

ETL in minutes >

Extract from the sources that run your business.

ETL transformation types

Regardless of where in the process transformation takes place, it’s an important step in the analytic workflow. Transformations prepare the data for analysis. Here are some of the most common types:

  • Basic transformations:

    • Cleaning: Mapping NULL to 0 or "Male" to "M" and "Female" to "F," date format consistency, etc.
    • Deduplication: Identifying and removing duplicate records
    • Format revision: Character set conversion, unit of measurement conversion, date/time conversion, etc.
    • Key restructuring: Establishing key relationships across tables
  • Advanced transformations:

    • Derivation Applying business rules to your data that derive new calculated values from existing data – for example, creating a revenue metric that subtracts taxes
    • Filtering: Selecting only certain rows and/or columns
    • Joining: Linking data from multiple sources – for example, adding ad spend data across multiple platforms, such as Google Adwords and Facebook Ads
    • Splitting: Splitting a single column into multiple columns
    • Data validation: Simple or complex data validation – for example, if the first three columns in a row are empty then reject the row from processing
    • Summarization: Values are summarized to obtain total figures which are calculated and stored at multiple levels as business metrics – for example, adding up all purchases a customer has made to build a customer lifetime value (CLV) metric
    • Aggregation: Data elements are aggregated from multiple data sources and databases
    • Integration: Give each unique data element one standard name with one standard definition. Data integration reconciles different data names and values for the same data element.
 

Extract data from the sources that run your business

Free 14-day trial. No credit card required

Keep Learning about ETL Transformation

Stream all your data to your data warehouse.

Select your integrations, choose your warehouse, and enjoy Stitch free for 14 days.

Set up in minutesUnlimited data volume during trial