Data Science and Moneyball: A Profile of Pete Skomoroch, Principal Data Scientist at LinkedIn
July 13th, 2012 Rachel Hyman
Pete Skomoroch is a Principal Data Scientist at LinkedIn, where he leads a team that builds features like LinkedIn Skills. He was previously the Director of Advanced Analytics at Juice Analytics and a Senior Research Engineer on AOL’s Search Analytics Team. Pete earned his B.S. in Mathematics and Physics from Brandeis University and did graduate coursework in machine learning at MIT.
Metamarkets: What’s a normal day like for you as a data scientist?
Pete Skomoroch: Day to day, it varies quite a bit. I lead a team at LinkedIn that’s particularly focused on identity and reputation. There are other data teams here that are working on things like strategy, business analysis, or fraud protection. Our team is primarily building data products like LinkedIn Skills, which is an automatically generated, inferred set of topics that covers the universe of skills and expertise that our members have. There’s a lot of text mining and natural language processing involved to build something like Skills.
Then the other area that we focus on a lot is around reputation and identifying experts. Specifically, the challenge there is to find what properties make someone an expert in skills like big data or in Hadoop. Once you can infer who knows Hadoop, you can gather that set of people and then you look at how they are connected to each other. Where did they work, go to school, what’s their professional activity like? What are they saying online, who are they recommending, what companies do they work at? All this information basically gets crunched in a lot of advanced algorithms, and then we can find the most relevant people for that topic.
You can think about the applications of this within things like people search and hiring solutions. When you look for someone who knows Hadoop on LinkedIn, you actually get relevant people, as opposed to someone who just lists it on their profile. There’s more intelligence beyond just keywords. It’s somewhat analogous to what Google does with web pages.
Skills is also an example where we saw an opportunity to apply data to improve the user experience. Our algorithms process the data from millions of member profiles to suggest relevant skills you might want to add to your own profile. This all starts with building a prototype, engineering the site feature itself, then launching and iterating on the algorithm.
Metamarkets: So it’s an iterative process? You roll out the first version and then you get data on how well it’s working and improve upon the model?
Pete: Yes. Improving existing algorithms and data is another good portion of our time. For example, to improve suggested skills, we’ll examine which suggestions users accept or reject. Let’s say someone rejected all of our suggestions. That’s a pretty strong signal something’s wrong. We’ll segment the feedback from users, look at the click-through and conversion rates, and then we’ll find, “Oh, it seems that for people with a major in geography, we’re not doing so well. Let’s look deeper.”
We’ll investigate and think of ways to improve the model. There’s a lot of this exploration going on, and we use that exploration to build a hypothesis around how to improve things. We spend some portion of time building that new functionality, and then testing it.
Metamarkets: What was your path to becoming a data scientist like?
Pete: I was always fascinated by physics and the mind and neuroscience. I initially went to school for neuroscience and biochemistry. Then I decided I didn’t like lab work as much as I liked cracking codes and analyzing signals. I was originally taking math and physics to support the neuroscience side of things, because studying neural signals requires a lot of math. Then I found I just liked that a lot more, and so I switched my major to physics and math.
As I was getting out of school, I realized that I actually have a strong bias towards building things and producing things that have an immediate impact. In physics, the time period before you see the results of your work can be pretty long.
Basically, I think data science has been a theme that’s been driving my career for a while. The first startup I was at back in 2000 was called ProfitLogic. The main idea of that company was that retailers for a long time had not been data‑driven, but had made decisions based on rule of thumb or intuition or what they had always done in the past.
It was kind of like “Moneyball,” but instead of baseball it was in retail. The idea was that we could apply these mathematical models and data analysis to make better decisions about pricing for clothing and retail goods in general. I was an analyst there. There were a bunch of mathematicians and physicists that they hired along with software engineers. They all worked together to figure out elasticity of these goods and how much you should charge for them to optimize profit. That was the early days of working on what were at the time fairly large data sets from retailers, getting all of the point-of-sale transaction data, crunching odds and spitting out recommendations for users, in this case the retail buyers.
At that point, I was still groping around in the dark and applying some math, physics, modeling, statistics, and basic programming abilities. I spent the next couple years building up my software development skills. Then I worked at MIT Lincoln Laboratory and took some machine learning graduate coursework, and worked in biodefense, which is getting back to my interest in biology and also working with large data sets.
In this case, the data sets were streaming sensor data. We worked on things like protecting military facilities and civilians against biological weapons. This was a good chance to apply machine learning, detection, and pattern recognition to large data sets.
Then from there I moved down to DC and got into the consumer Internet space working at AOL Search. Each, successive company for which I worked drew me deeper down the path toward evolving into a data scientist.
Some of the best data scientists I see often have worked in a few different domains. I think that helps with creativity and problem solving. A nice way to sum data scientists up that I’ve heard: “They’re better statisticians than your average programmer and they’re better programmers than your average statistician.”
That’s the hard skills side. Then on the soft skills side, I’d say they’re often very creative, which probably comes from having domain expertise in a number of areas and how they’ve seen similar problems before. They’re able to think of ways to use data to solve problems that otherwise would have been unsolved or solved using only intuition.
Metamarkets: Do you find it challenging to have to work across different domains?
Pete: I think the skill set side is a little bit more challenging because if you have to know a bunch of different things to be a data scientist, you have to have some programming and stats and a lot of these other things, it can be hard to keep sharp on all of those skills. I think that’s one challenge. Oftentimes when I list the things that a data scientist should know, it seems overwhelming. Some people may think they need to go very deep in one area, or they prefer to do that. Often, though, data scientists seem to be in the 80th percentile of a large number of areas, in terms of skills and maybe they’re a rock star in one particular area like machine learning or data visualization.
That’s the more difficult part. Working in different domains is good for people who are intellectually curious and just like solving problems in general. It can be a challenge, but it keeps life interesting. You see commonalities. If you take a really good data scientist and they’ve been working in bioinformatics, and then you drop them into a consumer internet company, they can often ramp up fairly quickly, pick up some domain knowledge and then start solving problems.
Metamarkets: What excites you at your job?
Pete: One thing I’m particularly interested in is the lens on the economy that we have, and the insights we can generate based on the large amounts of professional data flowing through our network. I’d say my specific mission as a data scientist is to build intelligent systems that help people make better decisions. That matches up well with LinkedIn, as our company mission is to connect the world’s professionals to make them more productive and successful.
At the macro level, the US economy is made up of millions of workers and companies. These companies have over 3 million unfilled jobs right now. At the same time, we have over 25 million people unemployed or underemployed. There’s obviously a mismatch there. There’s a gap.
One of the things our team is really interested in is looking at our data and trying to understand that gap, to understand both at an individual level and at a macro level what’s happening, and taking a pulse of the economy via our data.
I think that’s exciting, as it connects well with LinkedIn’s mission as well as my own personal mission. If people had the right information at the right time to help guide their careers, then maybe they could fill some of those jobs.
Think about the people who come in to interview at your company who are not quite getting that job but are almost there. By leveraging data, there’s a huge opportunity to understand what’s going on there in that inefficiency and understand, hey, if this guy had just known Java, or if this guy had just taken a statistics course, he probably would have made the cut.
At a micro level, that insight helps individuals. Data can help you understand that and help guide individuals to where the puck is going to be in five years and what skills they should be learning. That’s super valuable. You can find a lot of this information today on LinkedIn by looking at the pages we’ve created for trending skills like Data Science or DevOps. From an employer or from the country’s perspective, a workforce that knows which skills to acquire helps fill those unfilled jobs.
That’s the general idea I find really inspiring. That’s the thing that keeps me coming to work every day. Similar to the “Moneyball” example, there is potential to take a really difficult situation and actually turn it around through the power of data.
Click here for bonus interview questions, where Pete gives advice to aspiring data scientists and makes predictions for the future of the data scientist career. Next week Metamarkets will feature an interview with Hilary Mason, Chief Scientist at bit.ly.