Airbnb Automated Classification System
Machine Learning techniques from all ranges of complexity are used to analyze datasets. Unsupervised classification could be used to enhanced our understanding of a system by analyzing the classification outcome. While humans are good at classifying things visually, when multi-dimensionality is involved, machine learning can help us to close this gap and enhance our understanding. This project applies an unsupervised classification technique to Airbnb listings in New York City with the purpose of developing an automated categorization and rating system for listings to be used by consumers and Airbnb itself.
Introduction
This project will explore the possibility of creating a machine-learning driven categorization system of Airbnb listings in New York City. A rating system similar to those a guidebook might provide, giving a tourist an idea of what kind of amenities are found around the listing. This project will focus on a combination of neighborhood attributes in the vicinity of the listing and some attributes on the listing itself. The classification system developed could be useful to tourists and Airbnb alike. Using the system, Airbnb customers in New York City would get a better idea of the amenities available in the neighborhood around their rental, and Airbnb could use the analysis to better understand their listing inventory and customer preferences. Is there stronger demand for cheaper listings? For listings with better public transit connectivity? For listings close to certain kinds of amenities? There are several rich possible applications.
Data & Methods
A. Airbnb Listings
The Airbnb listings for New York City were collected from InsideAirbnb.com,[2] a website run by a New York City-based housing activist named Murray Cox. The website scrapes Airbnb’s website for many cities around the world, creating snapshots of all listings on the site in a city on a given day. The data used in this analysis was from the scrape of New York City on March 2nd, 2017.
B. Outliers
To make sure the analysis was performed on Airbnb listings that are actually being rented we removed certain outliers. This left us with the listings with the below attributes:
- Minimum of 7 or less nights per booking
- Listing price of $500 or less
- At least 1 review
C. Custom Attributes
In addition to the attributes collected by Inside Airbnb, we added four custom attributes to each listing. We developed using publicly available data. They are:
- Median Household Income
- Craft Beer Count
- Specialty Coffee Count
- Connectivity Score
You can read more about how these custom attributes were built from different sources of data and geo-spatial operations at the Paper <Airbnb_Paper.pdf> and at the Jupyter Notebooks.