Toronto House Prices: A Hedonic Geospatial Analysis
Summary:
- Created a tool that estimates house prices (RMSLE ~ 0.1325) to help house buyers, sellers, or agent negotiate their house prices and external factors such as neighborhood income, distance to the nearest hospital, transit stop, park, and crimes nearby.
- Gathered data from multiple data sources, primary Zoocasa.ca, and external factors Toronto Open Data Portal, Airbnb.ca, Toronto Police Service Data Portal.
- Engineered 15 features from 8 different datasets using Python.
- Implemented geospatial proximity joins using K-dimensional tree package from ‘scipy’ to engineer features on shapefile.
- Used Hedonic Spatial Regression to determine elastic 10% increase of features on the final price.
- Designed a stacked meta-model combining ENET, GBDT, and KRR models, with linear regression as our base model to get 85% accuracy.
- Devised a final model using weighted averages 70%-15%-15% for the stacked model, LightGBM, and XGBoost respectively to reach 87% accuracy.

Resources:
- Python Version: 3.7
- Packages: pandas, numpy, sklearn, scipy
- Visualizations: Tableau, matplotlib, seaborn, brewer, folium
- Techniques: Linear Regression, Elastic Net, Kernel Ridge, Gradient boosting decision tree, LightGBM, XGBoost, RandomsearchCV.
Full Code can be accessed at the Github repository