Our cities of the 21st century generate massive amounts of geo-referenced multi-modal data: from open or governmental data, to the digital traces that we leave in online services like Foursquare or Uber.
Learn the basic techniques of spatial data science to analyze such datasets and apply them to the real-world problem of predicting Airbnb prices in a neighborhood.
Theory:
Familiarize participants with the challenges of working with geo-referenced, multi-modal data:
- Tobler’s first law of geography, coordinate systems, projections
- Types of maps (choropleth e.g.) and discretizations (natural breaks e.g.) and how they can be used to lie
- Data types, formats, and sources!
- Types of spatial data analysis: ESDA, spatial modeling, etc.
- Problems: MAUP, ecological fallacy, spatial-autocorrelation
- Measuring proximity: nearest-neighbors, weight matrixes
- Global measure of spatial-autocorrelation: Moran I’s
- Spatial autoregressive models (lag vs error spatial models or a combination )
- Applications: urban computing, crime prediction, accident hotspots, etc.
Application:
We will tackle a real-world problem and apply the concepts above to the following question: How are Airnbnb prices influenced by the listing properties and the attributes of the neighborhood?
Datasets used (will be made available, here the original sources):
- Airbnb listings (on point level) http://insideairbnb.com/
- Census (on census tract level) https://www.census.gov/
- Crime (originally on point level, processed on census tract level) https://opendata.cityofnewyork.us/
- Foursquare venues (originally on point level, processed on census tract level) https://developer.foursquare.com/
- Streetscore values (originally on point level, processed on census tract level) http://streetscore.media.mit.edu/
Beginner level
- Participants are expected to bring a laptop running Jupyter Notebook. We highly recommend installing it via Anaconda , which conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and (spatial) data science (pysal, numpy, pandas, matplotlib,, ...).
- Code and datasets will be available prior to the workshop and participants are expected to download it from GitHub.