Ensemble learning methods, that are often associated to the top rankings of many machine learning competitions (including Kaggle’s competitions), help in solving these problems by combining several weak learners to achieve better performances.
In this workshop we will go through the theory behind the most common types of ensembles methods in particular Bagging, Boosting and Stacking and the relative applications. Two methods will be presented in details, in particular Random Forest and AdaBoost, with a preliminary introduction to Decision Tree. Participants will use the explained ensemble techniques in a concrete scenario, where they will implement an end-to-end machine learning project using scikit-learn.
At the end of the workshop participants will be familiar with the most commonly used ensemble techniques and will be able to correctly implement, train, test and evaluate Decision Tree, Random Forest and Adaboost algorithms. They will also understand how noise in the data and the right choice of hypermarameters can affect the ML algorithms performances. Finally participants will learn to evaluate the importance of each feature in the final prediction.
Beginner level
- Python
- Jupyter Notebook
- Familiar with data cleaning
- Cross validation concept (although it will be briefly explained)
- Own laptop