"Data engineering is a new enough role that each organization defines it a little differently. However, broadly speaking [the] job is to manage the data and make sure it can be channeled as required (Spandan Bemby)."
As with data science, data engineering is often defined in terms of what its practitioners — data engineers — do. "A data engineer transforms data into a useful format for analysis" (Dataquest). This can involve data acquisition, cleaning, conversion, disambiguation, deduplication, and modeling. "The data engineer gathers and collects the data, stores it, does batch processing or real-time processing on it, and serves it via an API to a data scientist who can easily query it" (Insight Data).
"Data engineers are concerned with the production readiness of that data and all that comes with it: formats, scaling, resilience, security, and more" (O'Reilly). They build and maintain an organization’s data pipelines and "clean and wrangle data into a usable state."
A data engineer whose resume isn’t peppered with ... high-tech tools for data storage and manipulation probably isn’t much of a data engineer. But as important as familiarity with the technical tools is, the concepts of data architecture and pipeline design are even more important. The tools are worthless without a solid conceptual understanding of:
- Data models
- Relational and non-relational database design
- Information flow
- Query execution and optimization
- Comparative analysis of data stores
- Logical operations
A definitive guide to data definitions and trends, from the team at Stitch.