The data science field has been gathering a lot of attention, ever since the Harvard Business Review named data scientist “The Sexiest Job of the 21st Century.” Job postings for data scientists far outstrip the supply, and they pay a more than healthy salary to boot. In the years since the Harvard Business Review made that announcement, articles about “How to Become a Data Scientist” have popped up everywhere. But what do you do once you actually get there? What are data science jobs like? How do they differ from company to company? Just how high up the ladder can a data scientist climb? Data Scientist and Machine Learning Engineer Ghinwa Choueiter answers these questions.
Ghinwa has spent many years working in the data science field, and was recently nominated for the Arab Edition of the MIT Technology Review’s “Innovators Under 35” Award. She received her bachelor’s degree in computer and communications engineering in her home country of Lebanon, before moving to the U.S. to attend MIT and pursue her master’s degree in information technology, then her doctorate in electrical engineering and computer science, also at MIT.
Here’s what Ghinwa has to say about not just what it takes to become a data scientist, but to understand what types of roles and companies to look for.
Ghinwa: It’s good to be comfortable with statistics. Very important, because you need to know how to analyze the data, and then know the significance of that analysis. You should also look into machine learning classes, even though they often go really broad, while in actual jobs you might zoom in really close on just one aspect of it. There are lots of online classes about data analytics, and also about frameworks for handling big data. You also need to know a solid programming language. A lot of people like Python or R. Mathematicians like R, but I’m using mostly Python right now.
Ghinwa: I think there are three things you need to be: rigorous, as in, mathematically rigorous, an adventurer, and also you need a passion for the project you are working on. You can’t just have one or two. If you’re really math-y, you can get stuck in the approach and be afraid to make assumptions or simplify the problem. You’ll get stuck in the theoretical and try to fit everything under one model. That’s why you have to be adventurous and take leaps sometimes in order to find the answers you’re looking for.
Ghinwa: Well, you share a common code base, so you need to collaborate with each other and review the code as it grows. And you all need to make sure that it doesn’t break. It’s also useful to be part of a team because then you can discuss approaches with them. So if you’re looking at a problem and have no idea how to handle it, it’s nice to be able to take it to somebody else who can look at it from an entirely different perspective.
That said, the bulk of the work we do is on our own. When we’re working, we usually aren’t all gathered around one computer.
Ghinwa: Well, I suppose things are more standardized at larger companies. You have a better idea of how to do what you need to do; you don’t have to make as many leaps. And then if you want to really change the way things are done, it’s really hard to make that change, and it would take a lot longer, because there are so many people working on a project, and processes are so entrenched.
With really young companies, the work we’re doing is a lot more explorative and innovative. We don’t really know how we’re going to solve a problem right at the beginning, so there’s more experimenting, and while we might be using basic pre-existing techniques, we’re adapting them for this totally new field. And we’re making new changes and iterations all the time. You’re also less likely to be working with bigger data sets.
Ghinwa: A lot of the basic algorithms and approaches stay the same no matter how big the data set is. The basic statistical analysis that you use in the beginning to explore the data is also the same no matter how big the size.
What really changes are the tools you use, and the framework you use. You really have to think big with big data. It’s not just ‘oh I have a hundred times more data,” it’s that you have to approach the data very differently. You can’t just process the data sequentially like you would with smaller data sets. You have to use cluster computing and process the data in parallel, otherwise there’s no way you could go through all of the data.
Ghinwa: Well, obviously the field you’re in is a big factor. I’m interested in fields that have an effect on the world, well, hopefully a positive effect, like health, or education, or energy conservation. So, depending on the field, the problem you’re trying to solve changes.
The field you’re in also changes how you approach problem-solving. I used to work in speech recognition, which already had decades of work backing it up. When I’d run an algorithm, I’d know what I was doing. Now I’m interpreting electricity signatures from homes and breaking them down into individual devices, and that’s a type of data that very few people get a chance to look at. So I really just kind of explore and hope that what I’m trying will have a useful end result, because I have no idea.
It’s nice. I find I’m very curious. You try, and you know, you’re flailing. You don’t even know what’s going to work. It’s very rewarding.
Ghinwa: You can go anywhere, really. Some people like to stay really close to the data, so they become like, a senior analyst so they can keep their head down and crunch numbers as much as they like, and others go up and become a CTO, or even start their own company and become CEO.
It’s like a lot of tech jobs, where as you climb the ladder so to speak, you get farther away from the really technical stuff. That said, if you’re in a small company, you can still spend a lot of time doing code and things like that if you want to.
Back when I first started, I thought, ‘I’m going to be an engineer, and keep my head down, stick to the data for all of my life,’ but as time passed, I started thinking ‘maybe I can grow and progress with this, try more leadership roles.’ So it really depends on what you want to do, that’s something you can choose for yourself.
Ghinwa: Well first I look to see what type of field it’s in, to see if they’re working on a problem I think is interesting. That’s the big thing, especially because the job listings don’t usually say a lot about the specific work you’ll do, they’ll just mention a programming language you should know or some basic things like that.
What I want to know as I go through the interviews is whether I’ll be getting to try new things, like solve new problems and create new stuff, or whether I’ll be doing more maintenance-type work. I’d rather avoid maintenance work, and I want to keep adding to my skillset.
Ghinwa: I look for a strong technical background. That’s a math background, statistics, machine learning, programming languages. But I also want someone who is fun to be around, and someone who is self-driven, but who isn’t afraid of working with a team.
And that’s the word from Ghinwa! If you’ve got an analytical mind and aren’t afraid of taking some chances, data science might be right for you!