Background¶
Introduction¶
Data Science has been attracting smart people who have different backgrounds, experience, and field knowledge. The transformation journey from other disciplines into data science is not a straight forward process. It is time and effort consuming based on the motivation, the background, and the industry where the data scientist wants to work. The data science new joiners follow multiple paths to sharp their data science skills. Some of these paths are:
- Online courses: Some E-learning platforms, e.g., LinkedIn-learning, provide practical courses to solve specific data science problems. Other platforms, such as Coursera, offer theory-based courses. In LinkedIn-learning platform alone, there are 400+ courses related to data science. Selecting the best courses among all of those numerous courses is a challenge for newbies. Moreover, the theory-based courses require solid mathematical knowledge, especially in calculus and linear algebra.
- Data Science online platforms: Such platforms, like Kaggle, offer playground or prize-based competitions. Junior data scientists can learn a lot by applying their knowledge and reading kernels, which data scientists write to share their experience. Poorly written code and lack of documentation can be frustrating for newbies who want to learn what happened behind the scenes.
- Manuals of well-known data science frameworks: There are many open-source frameworks which provide an industry-proven implementation of many methods that have been used by data scientists. Many of these frameworks don’t share the same syntax. Data scientists may need to learn new syntax each time they switch to a new framework.
- Learning from senior data scientists: Onboarding junior data scientists may require time which senior data scientists don’t always have.
Data scientists may use AutoML to produce multiple types of models as an alternative to digging deep in data and gaining new knowledge. AutoML can create a large number of models. However, it doesn’t guarantee that the user gets the model that satisfies the quality requirements. It needs a long time for testing a wide range of hyperparameters values. Model reproducibility can be an issue when creating models using AutoML.