As we saw in our post detailing the 5 things you should know before getting a data degree, data science not only requires many areas of expertise, but is also constantly changing. As data scientist Amy Heineike explained, “The technologies and what we’re building on is evolving so so rapidly that even if you have a really good understanding of something right now, in two years you’re going to be out of date.”
In our interviews with data science experts, they made it clear that a good deal of upkeep is required to stay current in the field. Randy Bartlett believes that even after a master’s degree, “you have to learn about 50% of data science on your own.” Edwin Chen went even farther to say that “data science still isn’t really something you learn in school, though more and more schools are offering data science programs.”
To learn more about what’s required of data scientists, we spoke with a number of experts in the field:
Randy Bartlett has held analyst roles at Citibank, Wells Fargo, PWC, and AstraZeneca, authored A Practitioner’s Guide to Business Analytics, and holds two patents for predictive modeling.
Edwin Chen has worked on ads quality at Twitter, quantitative analysis at Google, and data science at Dropbox. His blog is a must-read among data enthusiasts.
Jason Dolatshahi created a data science curriculum for General Assemb.ly and taught the first session of the introduction to data science course. He is currently the Manager of Data Science at Bonobos.
Amy Heineike co-authored Data Scientists at Work, was the Head of Mathematics at Quid, and is now the VP of technology at a stealth startup.
Rob Hyndman has written more than 100 research papers and five books. He is currently editor-in-chief of the International Journal of Forecasting.
Mark Madsen has received numerous information management awards, including the Smithsonian/Computerworld award for innovative use of information technology. He is the president of Third Nature.
Andreas Weigend is the former chief scientist at Amazon. He has written more than 100 scientific papers on machine learning techniques and is currently a professor at the UC Berkeley Social Lab.
From the answers our panel gave when asked whether you can teach yourself data science, we’ve compiled a list of five essential actions and attitudes that keep data scientists learning long after their degrees.
1. Go to events and join communities
Rob Hyndman provided insight into why it’s crucial for data scientists to connect with their peers through data communities and conferences:
Hanging around Q&A sites like crossvalidated.com is really useful. Typically someone who’s practicing in data science will also be attending conferences like useR! or their local data science meetup group or their local R user group. There’s often speakers coming through that they’re getting new ideas from, or they’re discussing some package that they’ve heard of. There’s a lot of self-learning happening that way.
In both these events and Q&A sites, data enthusiasts are able to connect with one another and discuss their latest findings and roadblocks.
If you’re overwhelmed by the sheer volume of conferences to choose from, here’s a post on the conferences data scientists won’t miss.
2. Focus on asking the right question, not how to use the right tool
Weigend believes that to keep learning, you have to avoid getting bogged down in software:
Don’t be swayed by consultants that tell you Hadoop is data science. It’s not about the plumbing; it’s what you do with it. Many consultants make money by selling you systems, but instead you should ask the right questions. That’s why data scientists come from other fields like physics; they are used to carrying out experiments, forming hypothesis. Know what tool to pick for a given problem, and formulate the question.
If you’re stuck on which software to use, read Which Big Data, Data Mining, and Data Science Tools go together?
3. Participate in Kaggle competitions
Kaggle is a platform where data scientists take data posted by companies and compete to see who can produce the best models for that data. Chen spoke about how open source competitions like these are a fantastic way to practice your skills:
There’s plenty of data online just waiting to be analyzed (e.g., Kaggle competitions for machine learning, interesting public datasets through a bunch of initiatives), so just start doing it.
If you’re looking to start competing, read the Quora answers to What do top Kaggle competitors focus on?
4. Take online courses
Google “learn data science” and you’ll see just how popular online courses in data science are.
Prospective data scientists, or those just looking to stay ahead, are able to learn at their own pace, on their own time, and are able to focus on exactly what they want to understand.
Dolatshahi explained the influx of online courses available:
The classes offered now are extremely valuable because you don’t need to have a masters or any qualification if you’re interested in learning the material. The nice thing about taking a class is that it gives you a supportive learning environment with instructors, TAs, opportunities to ask questions, and a community of fellow students.
Hyndman also advocated the use of these programs:
Coursera has a fantastic data science program. I think there are four separate Coursera courses that run out of Johns Hopkins which are really good, which I recommend regardless of what your background is.
Interested? There are more than 150 data science programs you can take on Coursera right now.
5. Keep reading books, blogs, and articles
It’s hard to overstate the value of a good book. The same can be said of a well-constructed article written by a thought leader. Dolatshahi spoke about how important reading continues to be for his own education: “The most important parts of my experience have been reading articles and books, experimenting with technology, and talking to people.”
Bartlett also spoke about how important it was for his own education to read applied books with examples, as well as books about software:
You can’t really do anything without software, so the books that teach you in the context of a particular software package are some of the best. Frank Harrell wrote a book on using R for survival analysis and basic regression. What you want are books written by practitioners, people who’ve actually done things in the field.
Don’t know what to read? We’ve got you covered there too.
Data science is evolving quickly. To keep up, you need to continually self-educate, but, as Heineike stresses, how you choose to focus that learning is also essential. This is the thought process she looks for when determining whether a data scientist is right for her team.
I want people who will bring something to the table, so maybe they have some expertise in some area of statistics or mathematics or computer science that’s kind of novel to the team and broadens what we can think about. You need to have a ferocious appetite for learning and know how to cope with continually not knowing what you’re doing — know how to continually be in a position where you have to learn a lot.