Chapter 2 Data sources

2.1 Inside Airbnb

The data is about 2020 Airbnb Listings in New York City. We download data from the Inside Airbnb [website]http://insideairbnb.com/get-the-data.html.

The Inside Airbnb project was brought to you by Murray Cox. He is an independent digital storyteller, community activist and technologist, conceived the project, compiled and analyzed the data and built the site. More story about Mr.Cox and his website:

"What began as a simple class project turned into an obsession that spawned the website Inside Airbnb. Cox now spends about 10 hours a week parsing statistics from listings in more than 100 cities and fielding half a dozen queries from academics and journalists around the world. Using publicly available information, Inside Airbnb gives a view into how many listings there are in a certain zip code, how many are entire private homes versus a room in a home, the price and the number of reviews each has received.It’s valuable information for cities that are trying to crack down on entire networks of managed Airbnb units or serial renters whose practice eliminates apartments that would be otherwise available for people looking for a place to live.

Cox receives payments from some cities, including $200 a month from San Francisco, as well as the hotel trade association, and researchers. Maintaining the website costs him about $10,000 a year and the payments usually cover the bills, he says. In the past year, he has been flown to Barcelona, Australia, and Paris to speak at various home-sharing events about his findings." –[Yahoo Finance]https://finance.yahoo.com/news/meet-murray-cox-man-trying-100014100.html.

2.2 What We Chose

Thus, we are convinced that this website and the datasets are reliable because city officials are also using them for real estate management. We chose the data compiled on Oct.24 in the New York Metropolitan area.

The raw Listings dataset contains 74 variables, with slightly more categorical variables than numeric ones. Since the CSV data is well-established we will directly load data into R after we choose what variables to keep.

There are some listings with zero rental price which does not make sense in our analysis, therefore, we removed those data points. We also found there are listings with “availability_365” = 0, which means that these properties are for long-term rental. For this project we will focus on vacation rental, so we removed those values as well.