“A data pipeline is a set of actions that extract data (or directly analytics and visualization) from various sources. It is an automated process: take these columns from this database, merge them with these columns from this API, subset rows according to a value, substitute NAs with the median and load them in this other database” (Alan Marazzi).

“The purpose of a data pipeline is to move data from a point of origin to a specific destination. At a high level, a data pipeline consists of eight types of components:

More from the data glossary

A definitive guide to data definitions and trends, from the team at Stitch.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutes Unlimited data volume during trial