The people behind the Data Scientist.

SAS_2014_PREVIEW_Martimax_015.jpg The Data Scientist. Often described as the hottest job of the future. But what does a Data Scientist really do? What is his/her value for a company? Joline Jammaers and Jos Polfliet talk about their job at SAS Institute as a Data Scientist.

The need for a Data Scientist
Joline “Today more data is generated and captured than ever before. The type of data is more diverse and companies don’t always leverage it to the fullest yet. This is where our job as a data scientist plays a crucial part. 
Jos “Being a data scientist has been called the sexiest job in the 21st century and this is not only because we work with models all day long.  If you look for ‘data scientist’ in Google Trends, you will see that this euphemism started emerging around 2008 and has followed an exponentially increasing popularity since then. It is estimated that the total collected digital information doubles each year. A very large part of all that data contains unknown valuable information because it has never been (properly) analyzed. The last decade, companies have started to notice that analyzing their data can give them a lot of insights and competitive advantage. That’s where the data scientist comes in.”

The job of a Data Scientist
Joline “The challenge doesn’t only lie in analyzing and visualizing data but also in creating competitive advantage from it. Data is becoming an asset in companies and they expect a clear return on investment from it. We go out and meet companies on a daily basis that are willing to capitalize on their data investments. We help them in creating a data strategy and integrating that strategy in their business model. Usually, our customers or prospects already have an idea on how to use their data. We make sure together with them that this idea becomes a viable business case. Once we have a clear definition of our business case, the real work can start.”
Jos “In the mathemagical fairytale of my statistics and data mining courses, there was a perfectly ready dataset and a clear research question. Analyzing the data involved thinking about mathematical structures, trying to visualize 10-dimensional regression surfaces and rigorously checking all conditions and possible sources of bias.  Life was beautiful. But just like Neo awoke from his nirvana from the Matrix, starting to work at SAS as a data scientist was quite a shock at first but proved to be an awakening later. In the real world, data is often messy, incomplete and in a ridiculously hard format. What I learned at the university was how to be a data analyst, not a data scientist. Reading in and preparing the data requires a creative and technical approach in which a lot of programming is involved. Clearly defining what has to be analyzed or predicted is often an iterative process with the customer in hand. We try to understand their business problems and figure out a way to translate that in to hard queries and solid numbers.  This requires both business insights, communication skills and a broad knowledge of different statistical techniques and approaches. Reporting a p-value in a business context is boring and unimaginative. We data scientists approach reporting as an artistic form of expression. Visualizations should be aesthetically pleasing and captivating, while having a high information density. Often, we find surprising connections and patterns that were previously unknown.”
Joline “What Jos says is true. Most of the time, the data that we want to use will reside in different data lakes, data marts or data warehouses. Not all data sources can be used without preprocessing. We’ll use a lot of data quality and data integration techniques to make sure the format of the data is suited for analysis. Depending on the number of different sources to combine and the quality of the data, this part of our job can take up to 70% of the time needed for the project.
After constructing our first analysis table, we’ll start exploring that table for insights. These insights are always checked with the key stakeholders in the project. It’s not only important that we get a good understanding of the data, we have to make sure the knowledge is shared with the customer. 
The next phase is the one where we can let our creativity thrive. It’s the analysis phase where we apply different statistical techniques to find insights that nobody ever found before. 

Working as a Data Scientist at SAS
Joline: At SAS, we have a dozen of data scientists who are top notch analytical gurus in their domain. It’s not uncommon for us to sit together and brainstorm on how we could get the best results of our insights we gathered with the data of our customers. It’s really a team effort to come up with the best innovative ideas for analysis. Once we find the model that shows the best results, we go back to our stakeholders and explain what the model tells us. Nobody wants to rely upon a model that they don’t understand, so we make sure that our statistical techniques are translated into business language. This is, for us, the point at which we’ll understand if our model is going to fly or die. After this stage the model needs to be implemented, this means that the model is embedded in the business processes. Depending on the project, it will be used behind the scenes to make real time decisions on the actions that are taken or used as strategic insights. It doesn’t stop there, the results of the models are carefully monitored throughout time to make sure the performance is guaranteed, if not we’ll have to find out why the performance is dropping and how we can solve this.
Jos: “I agree with Joline: Teamwork is crucial. What I also really love about working as a consultant at SAS Institute, is that it really drives us to the forefront of technology as well as pushing the limits of our own capabilities.” 

The pleasure of being a Data Scientist
Jos: “Being a data scientist is wicked fun. Because it is such a multidisciplinary job, there is no room for boredom and routine. Every problem is something completely new, requiring an innovative way to look at it by combining existing knowledge with expert input and creative ideas.
Joline: “Our project lead times are quite short (< 6 months) because we like to focus on quick wins and make sure that the time to results is limited. This is very motivational, as a data scientist you’re eager to have a quick feedback loop, know that your customer is satisfied and be assured that your work is really generating business. Additionally, you know that the industry you’ll be working in six months from now, can be completely different than today. It makes sure that our curiosity for new challenges is fueled every day. “

To know more about our open vacancies at SAS, please go to