Implementing a data science process in your company

During my first class of Economics in college, the professor asked what the purpose of a company is, and after 5 minutes of a debate he wrote on the board:

The companies are to earn money.

Simple, straight and direct, now well, what is the use of implementing data science in a company?

Implementing a data science process serves to earn more money.

Introduction

I want to introduce in this article three basics ideas about data science.

At the beginning the definition of data science along with a vision of why data science can help to improve the benefits of the companies.

After that, explore one of the best definitions of The AI Hierarchy of Needs by Monica Rogati to understand all the steps that a company requires to implement the data science process taking advantage of its full potential.

As a conclusion, a review of the different profiles required to complete a data science team considering the size and type of company.

What is data science?

Data science is the process of collecting, cleaning, analyzing, visualizing and communicating data to solve problems in the real world.

The implementation of data science in different areas of a company seeks to improve its processes and increase its value. The idea is to transform data into information to increased revenue, reduced costs, increase the business agility and also enhanced customer experience.

Data science something sounds like the holy grail however require extraordinary efforts until to obtain the successful implementation to collect data, process and explore and transform to obtain KPIs, metrics and valuable (and reliable) insights about the business, and even more effort to develop the predictive analysis.

The final objective of data science is to obtain business-focused insights from data. To achieve this, it is vitally important not only the technical implementation but the participation and the involvement of the business areas, in a recurrent process understanding and identifying business opportunities.

Main benefits to implement a big data process

The main benefit is to obtain reliable information about our business (customers, products, workflows) to get actionable insights to gain more profits for the company.

1. Learn about your actual and new customers

Currently, many companies have much information about their customers: who they are, where they live, what their behavior is online, how is their shopping history, what are the products that season after season buy (or not).

The next step is using that data into information, information that can be used to understand the different profiles, preferences are, and also, the possible actions and associated outcomes.

Also, it is entirely possible to detect new customers and new opportunities by combining different data sources.

2. Forecasting of future scenarios

The last level of any implementation of data science project into a company is forecasting: once that your data is reliable and consolidated it is possible to analyze future patterns and predict behaviors.

3. Improve business decision-making process

Usually, the process of business decision making relies on the expertise of management and data analysis of the business. Nevertheless, this process can change and improve through the simulation of a variety of potential scenarios using the in-depth knowledge of customer and the forecasting of the next tendencies to conduct us to the best business outcomes.

The needs of a company that wants to implement data science

When a company decides to implement a data science process will need people adequate prepared to do that.
Which is the possible expected background? There is no single answer, or perhaps yes: it is not possible to wait for just one person to cover the variety of the task related to data science.

by Monica Rogati – The AI Hierarchy of Needs

What does mean collect?

Our initial step is to start with the process of collect data. This kind of data comes from a variety of sources: content generated by the user like surveys, information about the behavior of the customer in our website, data from sensors, but also includes other types of data like images, audio and video and even data generated in real-time.

The common characteristic is the variety of the types and origins along with the impossibility to be analyzed in this early stage.

What does mean move and store?

The critical requirements of big data storage are that it can handle vast amounts of data and keep scaling to keep up with growth and that it can provide the input/output operations per second (IOPS) necessary to deliver data to analytics tools.

Because of the variety of data that it encompasses, big data always brings some challenges relating to its volume and complexity. A recent survey says that 80% of the data created in the world are unstructured. One problem is how these unstructured data can be structured before we attempt to understand and capture the most critical data. Another challenge is how we can store it.

One import point is that there is no such thing as the best database for big data. There are adequate options according to your necessities. What do you want to do with your data? Your answer helps you determine the data storage needed for the problem you need to address.

If you need to scan and aggregate Petabytes of data, then this type of use case would align with Hive.

If you need quick lookups over billions of rows of data, consider HBase.

If you wanted SQL syntax, but need the capabilities of HBase, you might consider using Apache Pheonix with HBase.

If you need to store JSON data type and required Geo lookup functionality, then MongoDB would be a choice to consider.

If you wanted a solution that required AWS, then depending on the use case, Snowflake, or RedShift.

So the list goes on.

In Big Data land, there is no one size fits all.

What does mean explore and transform?

Explore the data means to begin with an exploratory analysis to understand what do we have: how many rows, how many columns, what types of the files do we have and so on.

This small glance gives us an idea about how to transform data.

Transform the data also leads us to the process of cleaning incomplete values and take decisions about missing values.

It is essential to bear in mind that we do not even start working on something closer to an in-depth analysis. So far it is just a description of the necessary steps to have the first version of a dataset.

What does mean aggregate and label?

Aggregation aims at combining several features into a feature that represents them all. For example, you’ve collected the necessary information about your customers and particularly their age. To develop a demographic segmentation strategy, you need to distribute them into age categories, such as 16-20, 21-30, 31-40, 41+. You use aggregation to create large-scale features based on small-scale ones. This technique allows you to reduce the size of a dataset without the loss of information.

Data labeling is a different process, and it is related to ML algorithms and how to train a predictive model on historical data with predefined target answers. An algorithm shows which target responses or attributes to look for. Mapping these target attributes in a dataset is called labeling.

Data labeling requires time and effort; data labeling is the process to work with thousands of records and label them. As an example, if your image recognition algorithm must classify types of cars, these types should be clearly defined and labeled in a dataset.

What does mean learn and optimize?

At this point, our objective is uses the data to define metrics to track, a dashboard to facilitate the process of follow-ups and reports to share the information.

Key Performance Indicator (KPI) is a measurable value that demonstrates how effectively a company is achieving key business objectives.
An Dashboard is a reporting tool that provides a visual display of organizational KPIs, metrics, and data. The objective is to give at-a-glance visibility into business performance across all units and projects. Dashboards can be designed to ensure that appropriate KPIs are front of mind and relevant to their intended audience. Through the use of data visualizations, dashboards simplify complex data sets to provide users with at a glancing awareness of current performance.
A Report centralizes the process of receiving /providing information or reports to end -users/ organizations /applications through a BI software /solution.

What does mean AI and Deep learning?

The last level of the implementation of any AI strategy is the implementation of the deep learning algorithm to create a forecast with our data.

To use these algorithms is necessary to start with the process of labeling the data to incorporate later to the ML models.

Different sizes of companies implement data science in different ways

The size of the company and the resources that the company wants to invest in the process of data science will have a direct relationship with the size of the data science team.

Small companies

Small companies have limited resources in almost every area, and that applies to this case: frequently we can found just one person executing the data strategy.

In some cases and because the difficulty to find to just one person with the adequate skill set to excel in each area, we can see a small team composed by one data engineer (collect, move and store the data) and a data scientist (cleaning, learning and generating insight).

Medium size companies

We can find small team composed of three profiles: software engineer (to collect data), data engineers and data scientists.

The critical point is gain some specialization on a group of the process, so since a software engineer is working on collect data, the data engineer could work more productively on cleaning, move and explore the data. Also, the data scientist could work better and genuinely to obtain more insights and even create and implement some simple ML algorithms.

Large size companies

Large size companies are the opposites of start-up companies: they have the resources to hire specialized people for each step of the process.

We can find large teams composed by software engineers, data engineers, data scientists, and machine learning engineers.

Conclusions

We want to earn money with any process or idea that we implement in a business company, and data science can increase our knowledge about our customers and method exponentially.

Since we have more information about our business, we can transform that information in earnings via an increase of our sales, reduction of costs, the creation of new customers and better and informed decisions support by forecasting.

References

The data science hierarchy of needs by Monica Rogati
What is data science? by Joma
Definition of data science from WTF is data science? by Dylan Gregersen
How is big data stored?
Which is the best database for big data?
Machine learning project structure
Definition of KPI
Definition of dashboard

Tags.

Recent Posts.