Application integration vs. data integration: how they differ

Business processes use a lot of different kinds of data — payroll, billing, shipping, transactions, advertising, analytics ... the list goes on. All business systems maintain their own data, and each often has data that overlaps with other systems. Application integration and data integration are two approaches organizations can take to make use of data from different systems, but they meet different needs.

Application integration

When business processes encompass multiple applications, live data must move from one application to another, one event at a time, in close to real time. This might happen, for example, when an organization adds a new employee — data about the person might start in a human resources system, but also needs to be recognized in payroll, benefits, and other operational applications. Application integration software can orchestrate this process, serving as middleware between systems.

Application integration software can send data between multiple OLTP (online transaction processing) applications, from point to point, one application at a time. In one process a given application might serve as a source, while in another, it could be a destination. Application integration requires knowledge about business or application logic — you have to understand all the ways your organization uses data to get application integration right.

Application integration actions happen immediately as events occur. Timeliness is of the essence.

Applications use data defined by fixed schemas. These schemas, of course, don't define the same sets of data, nor do they always define corresponding fields in the same way, which means that data must be transformed after being pulled from one application before it can be used by a second. That transformation has to happen in the data pipeline — it doesn't happen to the data that's stored in either application. Pre-load transformations can alter data in many ways — standardize column datatypes or values, changing names, dropping irrelevant attributes, deduplicate records, or validate data ranges, for example.

Prominent examples of enterprise application integration platforms include Talend, Mulesoft, Dell Boomi, and Jitterbit.

Data integration

By contrast, data integration is about compiling data from multiple sources into a single data repository. Typically data integration replicates data into a data warehouse for analytics and reporting, but data integration can also be used for migration and consolidation of operational databases. Data integration jobs usually run in batches, periodically — once an hour, twice a day, or at some other cadence, depending on how quickly data analysts need updated data — and may involve hundreds of thousands or millions of rows. It's possible to integrate streaming or real-time data instead of using batch processing, but doing so is expensive and usually not worth the tradeoffs.

Unlike application data schemas, data warehouse schemas can be created on the fly to accommodate the tables and columns an organization wants to use for analytics. Data warehouse schemas are also dynamic — data engineers can add or remove tables and columns at any time.

Because the schemas are dynamic, data integration pipelines don't have to perform pre-load transformations. Organizations with cloud data warehouses often prefer to store raw data and transform it as necessary depending on the use to which they want to put it — predictive analytics, machine learning, business intelligence, or something else. However, some organizations prefer data pipelines that do transformations, which they use to prepare data before storing it.

The flow of data in data integration goes one way, from sources to an analytics repository — a data warehouse or data lake. Unlike application integration, data integration doesn't necessarily require knowledge of business processes — it just needs data sources and a destination.

The influence of cloud computing

Application integration and data integration share one important characteristic: Nowadays, most such software runs in the cloud. That's a big change from just a few years ago, when all enterprise applications ran on local servers in a company's data center. But cloud computing offers many advantages, including near-infinite processing and storage resources, scalability with low latency, and lower capital expenditures. That makes cloud-native software as a service (SaaS) an attractive option for most organizations.

So which is better?

If you're wondering which approach is better, you're asking the wrong question. It's not a matter of application integration being "better" than data integration, or vice versa. Each is fit for a different purpose. You can think of application integration as working with data at an application level, and data integration working at a database level.

Stitch Data Loader is a data integration platform, designed to bring all of your data into a place where you can create business intelligence reports and visualizations. Stitch aims to make raw data available quickly, and let data analysts transform it in different ways depending on their needs, so Stitch doesn't perform transformations except very basic ones, such as providing consistent datatypes for particular fields. To get started quickly, sign up for Stitch and begin moving data in five minutes.

Image credit: iBlogZone