“It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it.”
John W. Tukey
Data preparation and exploration is the most time consuming part of machine learning modeling, and it also highly influences the performance of the models. Some of the steps in data cleaning and exploration are must dos and some are really dependent on the the domain.
In this workshop we will go through various techniques involved in preparing data for machine learning modeling. We will not focus on the modeling, but rather we will choose a linear model and show how we can improve the model’s performance by better understanding the data and improving the data quality.
The workshop will be split into two parts, the first session focusing on the basic data exploration and the second session on advanced exploration. Participants can attend both sessions or just one. Beginners should though attend the first session as it will help with the more advanced session.
At the end of the workshop participants will be at best be familiar with the various steps in data exploration and data cleaning needed when preparing structured data for machine learning modeling.
At worst participants will have a bird eye view of various techniques for data exploration.
Beginner level
- Basic Python and statistics knowledge
- Running Python installation on own laptop, and Jupyter Notebook installed