Data wrangling is the process of gathering, selecting, and transforming data to answer an analytical question. Also known as data cleaning or ‘munging,’ legend has it that this wrangling costs analytics professionals as much as 80% of their time, leaving only 20% for exploration and modeling” (Elder Research). “If you want to create an efficient ETL pipeline (extract, transform, and load) or create beautiful data visualizations, you should be prepared to do a lot of data wrangling” (Springboard). “It’s where most of the real value is created and it’s the most thankless, difficult, and poorly understood job I know of” (Dan Haight).

Data wrangling is “the process of programmatically transforming data into a format that makes it easier to work with. This might mean modifying all of the values in a given column in a certain way, or merging multiple columns together. The necessity for data wrangling is often a by-product of poorly collected or presented data. Data that is entered manually by humans is typically fraught with errors; data collected from websites is often optimized to be displayed on websites, not to be sorted and aggregated. If you work with SQL regularly, you’ll need to become really comfortable with these skills, as they are what will allow you to get to the fun stuff” (Mode).

More from the data glossary

A definitive guide to data definitions and trends, from the team at Stitch.

Give Stitch a try, on us

Stitch streams all of your data directly to your analytics warehouse.

Set up in minutes Unlimited data volume during trial 5 million rows of data free, forever