Data integration is the process of combining data from multiple, disparate sources into a data warehouse destination. It’s a key part of the process of turning raw data into insights that drive better, faster decision-making.
Data integration and related terms
Data integration is a broad topic, with an abundance of related concepts, so it’s easy to get confused. Let’s begin by defining some terms.
|Data integration||Data integration is the process of moving different sets of data from databases, SaaS platforms, and other sources into a single centralized repository.
Businesses choose to integrate data for a variety of reasons, including reducing data silos, updating legacy systems, and producing more thorough business intelligence (BI).
|Data replication||Data replication puts copies of data in different locations, generally to improve data availability or system resilience. In a data analytics environment, organizations replicate data from transactional databases to analytical repositories to facilitate BI reporting.|
|Data migration||Data migration is a one-time event in which data is moved from one platform to another, and the destination becomes the new source of the data.|
|Data ingestion||Data ingestion describes the process of a database accepting data from another source. Before data can be used for BI, it must be ingested.|
|Extraction||Data extraction refers to the process of targeting and retrieving data from a source in order to begin moving it to a new destination — often one designed to support online analytical processing (OLAP).|
|Transformation||Data transformation is the process of changing data as or after it is moved from a source to a destination.
Businesses may need to transform data when different database systems store data in different formats, or when data must be changed from a source format into a different format at a destination.
|Loading||Loading refers to the process of delivering data into a destination database after the data has been extracted from a source. Data may be transformed prior to loading (ETL) or post loading (ELT).|
|ETL||ETL — extract, transform, load — refers to the process of extracting, transforming, and loading data into a new destination.|
|ELT||ELT — extract, load, transform — is similar to ETL, but the data is transformed after it has been delivered to its destination, which is usually a cloud data warehouse.|
Try Stitch for free for 14 days
- Unlimited data volume during trial
- Set up in minutes
Data integration and ETL
In the ETL process, an ETL tool extracts data from a source, loads it into physical or virtual storage, and transforms it as necessary for analysis. The sequence of these steps may vary, depending on the destination: on-premises or in the cloud.
ETL — extract, transform, load
ETL — extract, transform, load — is effective for integrating data to on-premises data centers. On-premises hardware has a limited ability to scale when processing demands spike, so it makes sense to do as much processing as possible in the data pipeline. Also, security concerns or regulatory requirements may make it impossible to store raw data in a destination, so the tool must transform the data before loading it to the destination.
The downside to ETL is that the transformation process can be resource-intensive, slowing the data integration process.
ELT — extract, load transform
When the destination is a cloud-native data warehouse like Amazon Redshift, Google BigQuery, Snowflake, or Microsoft Azure SQL Data Warehouse, ELT is a better approach. Organizations can transform their raw data at any time, when and as necessary for their use case, and not as a step in the data pipeline.
Data integration and business
Data integration was a $6.56 billion global industry in 2017, and that figure could reach $18.45 billion by 2025, according to the report “Global Data Integration Market – Size, Outlook, Trends and Forecasts (2019 – 2025)” by Envision Intelligence, a market research firm.
The data integration tools and technology are designed to turn raw data into insights. Business reports and dashboards tell decision-makers what’s happening in their business, and also may shed light on why it’s happening.
Data is a valuable enterprise resource, but its usefulness is limited unless the data is current, accurate, and available to everyone in the organization who would benefit from analyzing it.
Busting information silos
Enterprise data often is siloed in organizational departments: financial data stays in the finance department, and marketing data stays in the marketing department. Data integration busts information silos by making data available to more decision-makers, who can use BI tools to get insight into improving operations, improving products, saving money, and discovering hidden opportunities. By assembling and analyzing data from multiple sources through integration, business leaders can discover trends and patterns that might have been impossible to find before.
Handling the constant flow of data
Data on sales, expenses, and other business activities is in constant motion. Literally every second, transactional systems generate and store data, and organizations add new data sources all the time. Integration brings in fresh data for analysis as it’s needed.
The data integration process
As you think about the data integration process, begin with a checklist to ensure that you don’t overlook any business or technical considerations. Here is an example of a basic checklist:
- Who will lead the data integration project? Who else should be involved?
- Will the proposed process/solution be used by IT staff only, or will other business departments need access?
- Will roles be clearly defined?
Business goals and objectives
- What’s the main reason for this data integration project? Are there any secondary reasons?
- How will you measure the success of this project?
- What specific outcomes do you seek?
- What’s the expected timeline for the data integration project?
The existing process
- What processes, if any, are currently used for data integration?
- What hardware/software systems are in place?
- Are there any existing cloud-based data stores?
- In what ways will current processes/systems change when a new data integration process is in place?
- Will additional, fewer, or different employees be needed to ensure the success of the new process?
Identify the data
- Can you identify the source(s) of your data?
- What do you currently get from your data?
- What would stakeholders like to get from the data?
Evaluate data integration solutions
- Will you build your own, or purchase on-premises or cloud-based solution/tools?
- What is the cost of the proposed solution?
- Will the proposed data integration solution be able to scale to satisfy changing needs?
- Will you need additional hardware or software?
- How much maintenance will be required?
- Can your team use and maintain the data integration solution, or will it require additional hires or outside assistance?
- Have you established KPIs and metrics?
- Who will manage the new solution? Who will know when the solution needs to scale up or down?
- How often will maintenance be required? What sort of expertise is required for this?
This checklist is not exhaustive, and every organization will have unique things to consider. But using a similar checklist to begin the data integration process will help to get stakeholders on the same page, and may spur conversations and generate useful ideas.
Getting started with data integration
Building your own data integration solution may seem tempting if you’re facing a smaller project with few data sources. In fact, however, creating an ETL platform from scratch requires no small amount of software development expertise, and adds another in-house application that needs ongoing maintenance.
It doesn’t have to be this way. Stitch is a cloud ETL service that provides connectors from more than 90 sources to the most popular data warehouse destinations. We make it easy to extract data from more than 90 sources and load it to an EDW in a flash. Our approach is simple, straightforward, and ready to go right out of the box. Give Stitch a try, on us — set up a free trial in minutes and get your data into an EDW today.