receives compensation from some of the companies listed on this page. Advertising Disclosure


4 Questions to Ask Before You Start a Machine Learning Project

Kostiantyn Didur
Kostiantyn Didur

Machine learning can be a powerful tool for businesses, but is it right for yours?

AI and machine learning are making a significant impact on multiple industries and changing the landscape of our society. These are not just hot trends; they are here to stay.

Still, machine learning is not a magical solution that applies to every single use case. So often companies embark on an AI development journey without a clear understanding of the value it should bring to their business. As a result, many data science and machine learning projects don’t have clear KPIs and simply drain R&D budgets.

That’s why managers have to ask themselves four key questions to justify the need for machine learning development.

1. Do you really need machine learning?

Machine learning projects are costly and take up much time. Additionally, the estimates for AI R&D projects are often quite vague and unrealistic. Most importantly, not all companies need machine learning in the first place. For instance, we argue in this article that solid data engineering alone can be enough to find invaluable business insights for companies across numerous industries.

The number of AI engineers is also an enormous challenge, as talent availability is a significant constraint across the globe. Machine learning still have certain limitations, and it currently doesn’t fit into every business case of every domain. According to TechCrunch, one can split all machine learning use cases into two categories:

  1. Classification models are used to break down large datasets into meaningful subsets. The examples would be image recognition and natural language processing.
  2. Regression models identify trends to make predictions. The use cases would be sales forecasts that take into account thousands of factors from macroeconomic indicators to weather forecasts to political threats.

Research institutions and tech companies have made massive progress in certain areas of machine learning, including computer vision, speech recognition, and natural language processing. Still, this technology is not a silver bullet. As of 2018, you cannot apply machine learning to every business case you might have in mind.

For instance, a computer vision engine that can identify a particular bottle in a row of identical bottles can prove to be too costly. It’s only logical to have a small team of data scientists investigate the use case before proceeding with development.

2. What type of machine learning do you need?

Currently, there are three major types of machine learning: supervised, unsupervised, and reinforcement learning. Let’s check out the use cases for each one of them.


Nearly 90 percent of current machine learning development projects deal with supervised learning.

You have an input data X and a target variable Y that you want to predict.

For instance an X could be parameters that describe a person like gender, age and personal preferences. Looking at this input data, you want to predict Y, how likely the person is to click your marketing ad in Facebook.

This technique is valid when you’ve got some big datasets of customer information and historical records that reveal who clicked your ads in the past. A supervised machine learning model analyzes that input data to find patterns and predict what demographic groups are most likely to click your ad.

Other use cases for supervised learning would be credit scoring, underwriting, equipment failure detection, and more.


With unsupervised learning, there’s just input data X and no target variables. Machine learning models then groups input data according to its reasoning. AI algorithms work through huge datasets and often find patterns and dependencies that humans cannot identify.

This technique is often used for marketing clustering. For instance, we can take the input data from the example above, and let the AI engine group people according to demographics and personal interests.

Reinforcement learning

With reinforcement learning, data scientists specify the rules of the “game”, the environment where the “game” takes place, and the final reward (in chess analogy, that would be the victory). As machine learning algorithms start “playing the game”, they try different strategies and learn from their previous experience to maximize the final reward. One of the most famous examples of reinforcement learning is Google’s AlphaGo.

Deep learning

Deep learning, a technique that utilizes artificial neural networks, is applicable to all three machine learning types, but is most often used in supervised learning. Deep learning is excellent at classifying objects based on their features. For instance, it can be used to categorize pictures of cats and dogs with high precision. 

Deep learning is behind Facebook’s Face Recognition technology, which is 99 percent accurate. The same technology powers advanced natural language processing (NLP), image and speech recognition software, which can be used in document processing (e.g., legal documents), sentiment analysis and word-processing software.

3. Are you ready for extensive data engineering?

Machine learning and data science depend heavily on data engineering. Before going for data science, you need to extract data from fragmented sources, transform it into usable datasets, and load it to the AI engine. The bad news is these tasks often cannot be automated. Different sources have unique types of data, so adjusting them requires a lot of manual work.

Even after the data has been extracted, transformed and loaded, it might not be good enough for data science. So next step is to clean the dataset by removing noisy data and adding any missing entries. ETL (extract, transform, and load) and data cleaning usually take up about 80 percent of the project’s time.

4. Do you need custom development, API software, or startup acquisition?

There are three strategies for companies to adopt machine learning.

1. Build a machine learning solution from scratch. This is probably the riskiest option, as only an estimated 10 percent of machine learning R&D projects succeed. It is still the most viable option for some narrow machine learning cases in specific domains.

2. Explore machine learning with cloud engines from Google, Amazon and the like. This is the easiest way to gain access to machine learning technology. On the downside, you cannot freely configure system parameters. For instance, Amazon uses only logistic regression models, so is practically useless if you need to use different models for a particular project. That means, more sophisticated machine learning projects require custom solutions development. Furthermore, 80 percent of machine learning development is still about big data engineering. This is something you cannot delegate to Amazon.

3. Buy a machine learning startup. This is the most expensive option that suits only big companies.

Data science and machine learning often produce unexpected results and give invaluable insights. This technology is here to stay, and it’s going to evolve at an extremely fast pace. Answering the above questions will help you start your machine learning development journey.

Image Credit: sdecoret/Shutterstock
Kostiantyn Didur
Kostiantyn Didur Member
Konstantin Didur, senior marketing manager at N-iX. 8 years of digital marketing experience in software development, insurance, and automotive industries.