We will introduce tools for visualizing, extracting features and feature engineering and explain the logic and steps needed in order to create them. The techniques will be presented in the context of a simple machine learning model for predicting taxi tariffs and illustrate how we can improve it with these new features.
Participants will be able to learn:
- Visualise spatial data for exploratory data analysis
- Calculate distances
- Extract routes, route distances and estimated durations
- Extract city features based on open source tools, e.g. Open Street Maps
- Data processing performance Spatial indices in Python
- Ideas for traffic estimation
- Approaches and potentials for GPS analysis and how they can help you in terms of feature engineering.
We will download and run the project locally in your computer using a set of tools to assure reproducibility and and participants will be able to take the code with them so they can apply for their next project.
Participants are expected to learn the main techniques on spatial data analysis and be able to use the tools presented on the workshop for their next projects.
Beginner level
- Basic to intermediate knowledge in Python Desired
- Docker + make + wget (see README file in project repository)
- Install repository following the README file in the workshop's Github repository before the workshop. We will use the first 30 minutes to set things up but it will be better if you come already prepared.
- Workshop repository: https://github.com/caiomiyashiro/geospatial_data_analysis/tree/master/AMLD-2020