About a month ago we released our report, The State of Data Engineering. The data we presented sparked an enormous response from the Hacker News community — nearly 160 comments on Hacker News – and more than 15,000 people read our blog post on the topic. Some of the responses were the normal HN heckling, but most of them were an interesting discussion about a topic that clearly strikes a chord with developers working on data problems.
While some agreed that there is a shortage of talent in this area, the majority pointed to other reasons why companies are struggling to hire data engineers. I’ll share a few things that stood out to me from these conversations.
Reason #1: Your compensation is below market value
This was by far the most common rebuttal to the supposed shortage of data engineering talent. “If you are really serious about a shortage, you should be really serious about making offers that can be competitive,” wrote jnordwick, “but I keep seeing the same $150k offers. That isn’t a ‘shortage’ kind of offer.” This experience was echoed by several others, including whenwillitstop: “[I’m] pinged by companies obsessively for my big data skills, all trying to pay me less than I am currently making.”
This feedback aligns with what we found from Indeed’s salary ranges, most data engineer openings don’t exceed $130k. The delightfully named SmellTheGlove, currently working as a Director of Data Engineering, added, “I build teams and make the data move and land it clean so your Ph.D.s can do the smaaht stuff with it. I can stack BI and Analytics on top…” He said it would take 200k+ to get him to leave his current role, or more for a job in San Francisco. Most companies are not willing to pay that rate.
Reason #2: You don’t understand the value data engineers will deliver to your organization
Go one level deeper, and you find that the reason companies are unwilling to pay the market rate for data engineering talent is pretty simple: they don’t understand the value data engineers deliver.
In both the Hacker News thread and in our conversations with people working in this space, examples of companies willing to pay very well for this talent came up repeatedly. Netflix and Facebook (the second-largest employer of data engineers) are anecdotally known to pay data engineers north of $500k.
But outside of Netflix, Google, Facebook, and Wall Street, data engineers are reporting a high level of sensitivity to anything north of $200k. Hacker News fell into two camps about why this is:
Companies that think they need top-tier data engineering talent, but they don’t. The average tech company doesn’t have, as _derek_ put it: “finance/Google/Facebook level needs for data engineers,” and as a result, “They can’t reasonably claim to need top-level skills and then beggar out on the cost.” In other words, companies pay big money for data engineering talent because it delivers a ton of value to their business. If you’re unwilling to pay up for a big salary, it might be an indication you don’t actually need that level of talent. You might just be viewing a top-tier data engineer as “ornamentation” for your engineering team. achompas, a data scientist, said it like this: “Data engineers often develop ETL pipelines or data warehouses, both of which are very useful if your company has a data team and very useless if it does not.” So before you go chasing the hottest big data talent out there, have a plan for how you will use it.
Companies underestimating the value of just getting the basics right. In my original post (and the report itself), we used janitors and plumbers as analogs for data engineers. This was not meant to disparage data engineers, janitors, or plumbers, but after reading through the feedback, I see that this is a sensitive point. As kafkaesq pointed out, “It pretty much takes a SV alpha-nerd (or aspiring CEO seeking to cater to them) to come up with language like that.” Point taken. In retrospect, I can see why this labeling is so important to those doing the work of data engineering — most companies still view data engineering as grunt work, and salary levels at many companies reinforce that idea. Data engineering often ends up being forgotten, under-appreciated work that no one else wants to do. And software developers of all stripes have encountered this. Here’s one story from mrharrison (emphasis is mine):
I have been thrown these projects at work before, where I’m the front-end engineer and I need to make some cool D3 visualization, but lo and behold the data is shit, and I have to help the back-end team make the data useable. It’s a mind-numbing job, that nobody wants, because it sounds like a one-month task to get a good REST API up and working, but it usually takes three months, because you have to go back and forth making sure the data is right, and there are always 10 tricky edge cases that you have to work some magic on. Not only that but you need to have smart people cleaning the data, so that you don’t make some big mistake down the line or your REST API is super slow, and you have to add another couple weeks or month to rework the data again. So that one month becomes three months, and most likely a year, because somebody will say that looks great but can we also add this, and it goes on and on. It’s literally a mind-numbing job that most nobody wants. Data cleaning is a super golden problem to solve.
Here’s another really painful story from SmellTheGlove (emphasis is mine):
Once upon a time I managed (and, frankly, also wrote a lot of the code for) a project integrating half a dozen sources each managing a block of our business (billing, coverage, claims). The data was awful coming in and we managed to get a bunch of business processes changed in addition to some pretty heavy cleansing steps that we wrote. In any case, this big fragmented mess of monthly and weekly stacked data became my integrated, clean warehouse. For the first time ever at this organization, I had coverage and claims records tying up at a rate of 100% without any manual intervention. We did this so that we could implement a modern finance ops process on top (being intentionally vague) that would allow us to manage this block more efficiently, save time, and even let us better invest — it was a two-year project including my data work. A handful of actuaries and analysts got promoted out of this as it was a BFD to the company. Yet, at the end of the year, when I got my review, I got our equivalent of the average rating, 3 of 5, etc., and like a 3% raise, and a shitty budget for my people too. From then on, I spent almost as much time out there promoting our team’s work as we did doing the work. We did considerably better the next year, and that’s been the way I’ve operated ever since. I market the work.
Even for people who genuinely enjoy working on the challenges of data management, a lack of understanding about the importance of the fundamentals can zap the joy from it. Defending the joys of data management, dizzystar wrote:
Some people (me) really enjoy working with data, from cleaning, munging, creating, sorting, pipelining, etc., and find front-end visualization production excessively boring and mind-numbing.… I enjoy writing a script that finds a bad piece of data, or a script that fixes up everything, or writing something that was once unable to run at all get converted to something that runs in 500ms.
The response from mrharrison:
I also think data is fun and don’t meant to belittle the job, but in real-world scenario it’s often detail-intensive, underappreciated, tons of edge cases, and extremely complex if you plan to make it scalable and fast.… Customers will often complain at how long it takes and want more. It starts to wear away at one’s drive and passion for data. It’s not the data aspect, it’s the job/deadline aspect.
And back to the language people use, from rch:
I’ve heard more than one CTO/Senior Engineer refer to people in these roles as ‘data grunts’ or something similarly dismissive. Then they’re mystified as to why solid engineers are so quick to move up or out, year after year.
So, let me make this clear – data engineering work is first and foremost engineering work. If you want to get on the data superhighway, these are the people building your roads and bridges. How’s that analogy? :) There’s clearly a huge gap right now in executive understanding about this work — they want all the fun of “doing big data” with no real understanding about the importance of infrastructure. And as a result, many are unwilling to pay for what is often perceived as boring maintenance work.
Reason #3: You’re screwing up the hiring process
For anyone in the process of recruiting data engineers, this feedback is invaluable, ranging from the painful to just downright frustrating. Here is a story from ef5a0b0628 that should strike an empathy chord with anyone interviewing for a role in this field:
Every time something comes up on HN about a talent shortage in a field related to software engineering, it hurts. I have been unsuccessfully looking for a full-time position since my last startup folded six months ago.… It seems people in this industry refuse to understand that some people are not perfect.… I never graduated college because I hated it with the very fiber of my being, so I am not particularly great at whiteboarding answers to algorithm questions off the top of my head in a high-pressure environment. If I need them during my job, I look up answers and learn from people who are much smarter than I am.
This person’s experience mirrored that of another commenter who left their job when their spouse received an offer in Europe. ultramagas expected challenges related to looking for remote work, but the reality turned out to be much harsher:
It was a summer of shitty timed hackerrank-style tests (virtual whiteboard hazing). I would tell my co-workers about them and they’d laugh in bewilderment at the questions that were asked in what should be a technical screener, and these are extremely smart and productive software guys that have started companies, written books, given conference talks… There’s definitely not a shortage of talent. It’s that every company thinks they need ‘A-players,’ when the vast, vast majority are doing a damn basic CRUD app.
protomyth said that this seems to be a frustration that holds true across all technical roles: “I’m starting to think that the message is if HR is going to do checklists then developers should really make sure they work mostly with contracts that use popular checklist items.” As pyb put it, “The system is optimized for the needs of HR people.”
A lack of understanding about what indicates a good match for a data engineering often ends up with candidates feeling like companies are “unicorn searching.” SmellTheGlove, who does most of their own hiring, weighed in again with this advice on hiring:
Look for challenges faced and problems solved
Pay less attention to tech used
Learning a specific tech stack is easy
Process and problem solving should be primary
So wait – is there a shortage?
Despite the comments referenced above, I do still think the data shows a talent shortage, and there are plenty of commenters who agree. Regardless, the advice from software developers is spot on. If your company is struggling to hire data talent, a shortage of data engineering talent might not be the root of your problem.
Data engineering work is hard, complicated, and can be incredibly frustrating for anyone lacking a natural affinity for it. Take this advice to heart, and use it to inform how you go about adding data engineering talent to the team.
If you want to read the report that sparked all this feedback, you can access it here. And if you want to start consolidating data today without begging one of your software developers to write a bunch of ETL scripts, I hope you’ll check out Stitch. We offer a free, 14-day trial, and you can sync five million rows of data per month free, forever.