Data Science is one of those professional terms that seems to have appeared suddenly and spread rapidly to industry conversations everywhere. You may have heard it was declared “the sexiest job of the 21st century” by the Harvard Business Review back in 2012. Or maybe you noticed the term pop up in professional discussions, ads for job openings or Linkedin profiles. More than a decade since Data Science appeared on the scene, it’s now clear that this field is emerging as a new, revolutionary way to analyze and utilize information. As product managers, by nature of their role, have to be in the forefront of understanding new technologies, I set out to explain what is it that a data scientist does, and what should junior product managers know when they are considering using data science as part of product development.
What is Data Science?
Simply put, Data Science means utilizing data and technology to make better decisions. It’s the craft of employing scientific methods, processes, algorithms, and systems to extract knowledge and insights from data. These insights are often used to make predictions and enable the owners of the data to make forecasts into the future. Data scientists make use of a combination of mathematical knowledge and programming skills to deduce predictions from data presented to them. However, they must also possess an in-depth understanding of the business niche they are dealing with and its specific characteristics to be able to tell false predictions from valuable ones. The attempt to predict future behavior can be perceived as a scientific method of fortune telling, and this aspect is what gives data science much of it sexiness and mystique
Where did Data Science Come From?
The origins of data science lie in new approaches in statistics applied since the 1980’s. The growth in the volume of data exchanged and stored around the world and the development of data processing technologies opened new possibilities for analyzing data. This was revolutionary for many private companies and the field first came into the public eye in 2006 with the Netflix Million Dollar Prize competition. Netflix released a dataset of movie ratings it compiled from its users, without including any information about the users or the films they watched. The data was used in an open competition for the best collaborative filtering algorithm that would predict user ratings for films, based on previous ratings. The competition stirred a technological advancement that not only created binge watching as we know it but also ushered in the use of algorithms to predict customer behavior. By 2009, the main prize of 1 million, as well as several progress prizes were awarded and data science had its claim to fame.
How Does Data Science Work?
The process of a data science analysis, often referred to as CRISP-DM, usually includes the following stages:
- Defining the problem at hand.
- Data processing: a labor-intensive stage that includes extraction, selection, filtering and cleansing the data to make it ready for use.
- Machine Learning: applying different algorithms to the data to find the specific model that presents the most valuable results.
- Data Validation: making sure the model reflects reality and looks beyond the available data.
- Interpretation of the data: Looking for valuable insights implied by the model.
- Integration: deploying the model in existing operations.
What Kind of Problems Can be Solved with Data Science?
Typical challenges data scientists are faced with include:
- Regression problems: predicting factors with a continuous value like price, location, or service time. One possible use for this is in predictive maintenance, where malfunctions can be predicted and therefore prevented before they happen.
- Classification: predicting a category for an incoming item. This can help foretell whether a customer will click on your ad or buy in your store. This is also the principle at the base of spam filters.
- Clustering and Segmentation: predicting categories without defining them. For example, identifying different groups of customers among users.
When do Product Managers Need Data Science?
Broadly speaking, there are two possible scenarios in which data science can assist in product development:
- To improve the understanding of the data used in the product itself.
- To analyze data that gives insights on the product, its users or marketing data.
More than a buzzword, data science is a field of research that can help businesses gain insight into seemingly senseless figures and even predict future success and failure. However, no data scientist will be able to help you if your data isn’t collected and stored in the right way. It’s important that every organization implements a data strategy to properly collect, store and classify the data so it can be used to perform the magic that is data science.