Data Scientist Profile: Hilary Mason, Chief Scientist at bitly
August 7th, 2012 Rachel Hyman
Hilary Mason is Chief Scientist at bitly. She co-founded HackNY, which aims to create a community of student hackers, and edits the dataists blog. Hilary was formerly a computer science professor and studied computer science and machine learning at Brown.
Metamarkets: What’s a typical day like at bitly for you?
Hilary Mason: There really is no typical day, which I find exciting. My title is Chief Scientist and I lead our team of data scientists and engineers. As a team we do things like business analytics for helping to make better business decisions within the company. We also do research, that is, we’re pushing the bounds of how people understand social data or time‑series data, which comes through bitly. And then finally we do engineering, in that we will take the results of that research and sometimes build systems on it, like a real‑time search engine. We generally build those to the API level, but very occasionally we will actually build a human‑facing product as well. So you can see that it’s a mix of different kinds of thinking and different kinds of activities. And we also do a fair amount of communicating to the outside world about what makes bitly interesting, which involves a lot of writing and speaking.
Metamarkets: One of the things that you do at bitly is customer‑facing development. What’s an example of how you might have used data science towards that end?
Hilary: The best example is the just-released realtime, powered by bitly. It is a real‑time search and attention navigation engine. That’s an extremely clear example of how our work has built a product. Each page shortened by bitly, we snatch it and we analyze it. We pull out the keywords, and then we build these continuous time series models of clicks to those key phrases across all documents, which allow us to detect the ones that are getting a disproportionate amount of attention at any given time. Then we’re able to say, “OK. These are the stories that are bursting right now.” One example of that is Tom Cruise is getting a fair amount of attention all the time, but we saw a huge spike when it came out that he was getting divorced.
Our goals were to give people the same sense of power and control over the data that we have. It’s a set of tools for navigating what people are paying attention to on the Internet right now. You can do queries, like “Show me all the links about food that are being clipped statistically disproportionately in Brooklyn,” or “Show me what’s new in politics in French.” We’re really excited about getting this out to consumers and also hopefully getting people building on the APIs underneath it.
Metamarkets: So Real-Time is a customer-facing product. Would you have already built a tool like that just to use internally to see what content is trending?
Hilary: Yes, we had already built the underlying systems. Within bitly, we always use a model of building everything as an API. We decided to build the human-facing product because APIs don’t tell stories well. It’s much better to have an interface so you can click around and explore to get a sense of what the data looks like.
Metamarkets: What backgrounds do the data scientists on your team have?
Hilary: My background is in machine learning and my colleague, Brian, in the information of people. We have one guy with a PhD in applied math. We have two physicists, one theoretical physicist, and one computational astrophysicist. We have a couple of engineers, one of whom studied math, and one of whom has no formal education at all but is a total badass and has been building search engines for years. You can see it is a pretty diverse skill set. Everyone is somewhere along the spectrum of being able to do interesting math and being able to write production code.
Metamarkets: The idea of data science is a recent one. Do you have thoughts on the field of data science or the title of data scientist? Do you think it’s a good way to describe things?
Hilary: Yes. Data science actually does deserve a new title. It’s not anyone doing anything that has not been done before, but it is the combination of skills and capacities in one professional that has not been standard before. I do think the core capacities are, first, the ability to do math and to model the world, or a world, mathematically. Second is the ability to write code, at least well enough to express those models. And the third is the ability to know what’s interesting, an innate curiosity, and the ability to tell those stories. When I talk to business audiences, that last one turns into understanding business problems.
Metamarkets: What challenges does the field of data science face?
Hilary: On the infrastructure development side, I think that’s the area in which the canonical data scientist is most weak, and we still need a huge amount of work there. The toolsets that we have are still really immature.
Metamarkets: And what advice would you give to someone who is interested in data science and thinks that this is a career that they want to pursue?
Hilary: My first set of advice is to try it, and that’s easy. So you just go online and find a dataset you’re interested in, and there are tons from the government. A bunch of companies have made data sets available. And then just start to play with it and see if you can find something there that no one else has found before and share it. If you do that successfully, I think you’ll be completely hooked.
Metamarkets: Tell me about a current project you’re excited about.
Hilary: The one I’m most excited about personally is that I’ve been doing a bunch of restaurant data hacking around New York City. So I have all the menus for every restaurant that’s not a fast food restaurant, classified by neighborhood. I can do things like figure out where you should go for the highest density of Thai food or what the price points are for a cheeseburger. That’s been a lot of fun, just to satisfy my personal curiosity.