Data deduplication refers to the elimination of redundant data. In the deduplication process, duplicate data is deleted or linked together, leaving only one copy of the data to be stored” (Ellicium).

“Deduplication and data linkage are important tasks in the preprocessing step for many data mining projects. It is important to improve data quality before data is loaded into a data warehouse (International Journal of Computer Applications).

More from the data glossary

A definitive guide to data definitions and trends, from the team at Stitch.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutes Unlimited data volume during trial 5 million rows of data free, forever