What It Means to Be a Data Scientist Today

By Yao-Nan Chen, Machine Learning Scientist, Appier

Unless you have been hibernating under a rock for a few years now, you already know that explosive growth in the volume of available data is disrupting business as we know it. This data can be a goldmine for businesses that know how to capture, analyze and use it to power artificial intelligence (AI) technology. And that’s where data science and my role come in.

IBM has predicted that demand for data scientists will increase by 28 percent by 2020. The Harvard Business Review, way back in 2012, said that being a data scientist is the sexiest job of the 21st century.

I have been working in data science since 2013 and I still come into work at Appier each day eager to solve new problems.

What Data Scientists Actually Do

Simply put, data science involves using data to generate solutions that solve practical, real-world problems. In the business world, examples revolve around AI-powered solutions, such as pushing recommendations for users based on their demographic or usage pattern, or analyzing why sales of a particular product is dropping.

Data scientists set out on solving such problems by first extracting and consolidating data, which we then analyze for patterns and trends. We use this to build predictive models, derive insights, and implement proof of concepts to test the proposed solution to the problem at hand. The problems that we work on are very specific and often have no one standard solution. Hence, data scientists are tasked with thinking out of the box to come up with a variety of possible solutions.

The impact of our solutions is known only when they are implemented; so often, if the solution fails to meet the desired outcome, we have to go back to the drawing board and start over. But this just adds to the challenge and the excitement of trying to pin down that elusive solution and make it work.

What Makes for a Good Data Scientist

Of course, every job has some less lovable bits and the burden of the data scientist is data cleaning! In most cases, the data we gather is ‘dirty’, with errors and discrepancies in it. For example, data showing that sales of a product have dropped dramatically may simply mean that malfunctioning machines have failed to capture the data accurately.

Most data scientists will agree that data cleaning is the most boring part of this job. Our inside joke is that data science is 80 percent cleaning of data and 20 percent complaining about it!

But jokes aside, data cleaning is painstaking but important work. If not done right, it can have a huge impact on the accuracy and reliability of insights.

Aside from this kind of assiduity and attention to detail, a good data scientist, no matter how good they are technically, must also have a thorough understanding of business domain and the organization’s business goals. Our solutions have to be creative, but also useful and practical.

Keeping Up with the Latest Research

In this context, keeping up with the latest research in the area of machine learning can help us stay on top of trends and monitor breakthroughs to specific problems. We don’t need to reinvent the wheel – if a particular problem has been solved before, we can always work off that.

I regularly read papers on advances in machine learning, as well as in the specific domains that I am interested in.

It’s equally important to engage in discussions with peers, keep track of their recent research and poll their opinions on machine learning trends. This will help you keep abreast of all that is happening in this area.

Growing Demand for AI Expertise

Unfortunately, there is a gap between the growing demand for data scientists and the supply of talent in the area. AI is a new track and there is a shortage of people with the required expertise. What widens the gap is that not every data scientist is a good business person. They may be stellar at solving problems in an academic or research-based environment, but often fall short when it comes to real-world business problems.

Data scientists today must constantly evolve in terms of skill set. As the adoption of AI and deep learning grows, we are automating lower-level tasks and moving onto more complex problems. We already have some mature tools that can be used to build simple models for many business cases, and these are becoming simpler to use.

In the near future, data scientists will be required to know how to leverage and use problem-specific information. As AI becomes more complex, data scientists will need to work on more abstract problems and leave simple processing and analyses to automation software.


About the author:

Yao-Nan Chen is Machine Learning Scientist at Appier. He has more than five years of experience in Machine Learning, Data Science and Data Engineering and three years of experience in practical E-commerce recommendation system. Prior to joining Appier, he worked at Yahoo Taiwan on E-commerce Recommendation System, App notification recommendation system, Model tuning for sales volume prediction, etc.

 

 

AI 101: Deep Learning

Imagine that you are a marketer looking to run a targeted marketing campaign. What if you had a tool that could easily segment your market on the basis of factors like economic status, purchasing preferences, online shopping behavior, etc. so that you could customize your approach and messaging to each segment for maximum impact and conversion?

These are the kind of insights that deep learning (DL)* can offer.   

DL refers to a family of advanced neural networks that mimic the way the brain processes information and extract goal-oriented models from scattered and abstract data. What differentiates it from traditional machine learning is the use of multiple layers of neurons to digest the information.  

A DL program trains a computer to perform human-like tasks, such as speech recognition or predicting consumer behaviors. It is fed large amounts of data and taught what the desired output should be. The more data it’s fed, the better performance.

The program then applies calculations to achieve that output, modifying calculations and repeating the cycle until the desired outcome is achieved. The ‘deep’, hence refers to the number of processing layers that the data must pass through to achieve the outcome, and how the learning algorithms are stacked in a complex, hierarchical manner. The more levels or layers there are, the ‘deeper’ the learning.

DL can analyze huge volumes of data to detect patterns and predict trends and outcomes. This is especially interesting to marketers, finding application in predicting consumer behavior and campaign outcomes, marketing automation, sophisticated buyer segmentation and sales forecasting, to name a few use cases.

*Deep learning is not magic, but it is great at finding patterns.