Background¶

Introduction¶

Data Science has been attracting smart people who have different backgrounds, experience, and field knowledge. The transformation journey from other disciplines into data science is not a straight forward process. It is time and effort consuming based on the motivation, the background, and the industry where the data scientist wants to work. The data science new joiners follow multiple paths to sharp their data science skills. Some of these paths are:

Online courses: Some E-learning platforms, e.g., LinkedIn-learning, provide practical courses to solve specific data science problems. Other platforms, such as Coursera, offer theory-based courses. In LinkedIn-learning platform alone, there are 400+ courses related to data science. Selecting the best courses among all of those numerous courses is a challenge for newbies. Moreover, the theory-based courses require solid mathematical knowledge, especially in calculus and linear algebra.

Data Science online platforms: Such platforms, like Kaggle, offer playground or prize-based competitions. Junior data scientists can learn a lot by applying their knowledge and reading kernels, which data scientists write to share their experience. Poorly written code and lack of documentation can be frustrating for newbies who want to learn what happened behind the scenes.

Manuals of well-known data science frameworks: There are many open-source frameworks which provide an industry-proven implementation of many methods that have been used by data scientists. Many of these frameworks don’t share the same syntax. Data scientists may need to learn new syntax each time they switch to a new framework.

Learning from senior data scientists: Onboarding junior data scientists may require time which senior data scientists don’t always have.

Data scientists may use AutoML to produce multiple types of models as an alternative to digging deep in data and gaining new knowledge. AutoML can create a large number of models. However, it doesn’t guarantee that the user gets the model that satisfies the quality requirements. It needs a long time for testing a wide range of hyperparameters values. Model reproducibility can be an issue when creating models using AutoML.

To whom is ML-Navigator¶

Data science new joiners¶

A new joiner is a person who wants to move into data science from a different discipline. A new joiner can also be a person who wants to be a part of the data team but not a full-time data scientist, e.g., developers with sufficient coding skills. ML-Navigator provides the data science new joiners the path to analyze real data. It helps the user to navigate through predefined flows, which are End-2-End data science pipelines. The user can load a specific flow and follow the instructions starting from reading data until training the model. The user can start with the most straightforward flow and later use more complicated flows to train accurate models if needed.

Senior data scientists¶

Experienced data scientists may be interested in automating many processes that they follow frequently. They can build a flow for each specific problem type. The flow can be created from scratch or by modifying or combining other flows. They can share their flows with the community and exchange their experience with other data scientists.

ML-Navigator for enterprises¶

ML-navigator can standardize the data science experience in large enterprises. Junior data scientists can be productive and efficient from the first day. The onboarding process can be fast, concrete, but not abstracted.