Spatial Data Science with Open Data and Social Media

Workshop / Overview

Our cities of the 21st century generate massive amounts of geo-referenced multi-modal data: from open or governmental data, to the digital traces that we leave in online services like Foursquare or Uber.
Learn the basic techniques of spatial data science to analyze such datasets and apply them to the real-world problem of predicting Airbnb prices in a neighborhood.

Theory:
Familiarize participants with the challenges of working with geo-referenced, multi-modal data:
- Tobler’s first law of geography, coordinate systems, projections
- Types of maps (choropleth e.g.) and discretizations (natural breaks e.g.) and how they can be used to lie
- Data types, formats, and sources!
- Types of spatial data analysis: ESDA, spatial modeling, etc.
- Problems: MAUP, ecological fallacy, spatial-autocorrelation
- Measuring proximity: nearest-neighbors, weight matrixes
- Global measure of spatial-autocorrelation: Moran I’s
- Spatial autoregressive models (lag vs error spatial models or a combination )
- Applications: urban computing, crime prediction, accident hotspots, etc.

Application:
We will tackle a real-world problem and apply the concepts above to the following question: How are Airnbnb prices influenced by the listing properties and the attributes of the neighborhood?

Datasets used (will be made available, here the original sources):
- Airbnb listings (on point level) http://insideairbnb.com/
- Census (on census tract level) https://www.census.gov/
- Crime (originally on point level, processed on census tract level) https://opendata.cityofnewyork.us/
- Foursquare venues (originally on point level, processed on census tract level) https://developer.foursquare.com/
- Streetscore values (originally on point level, processed on census tract level) http://streetscore.media.mit.edu/

Workshop / Outcome

Workshop / Difficulty

Beginner level

Workshop / Prerequisites

Participants are expected to bring a laptop running Jupyter Notebook. We highly recommend installing it via Anaconda , which conveniently installs Python, the Jupyter Notebook, and other commonly used packages for scientific computing and (spatial) data science (pysal, numpy, pandas, matplotlib,, ...).
Code and datasets will be available prior to the workshop and participants are expected to download it from GitHub.

Track / Co-organizers

Cristina Kadar

Senior Data Scientist, NZZ

Benjamin Ryder

PhD Student, ETH Zurich

AMLD EPFL 2018 / Workshops

View workshops

TensorFlow Basics 2018 – Sunday

With Bartek Wołowiec, Ruslan Habalov & Andreas Steiner

09:00-12:00 January 281BC

TensorFlow Basics 2018 – Saturday

With Bartek Wołowiec, Ruslan Habalov & Andreas Steiner

09:00-12:00 January 274ABC

Financial Predictions with Machine Learning

With Stefano Tempesta

13:30-16:30 January 275BC

AMLD / Global partners

AMLD EPFL 2018 Spatial Data Science with Open Data and Social Media