Urban Informatics: Web Scraping with Python

photo taken of a sticker on the telephone pole across the street from Sci Arc at One Santa Fe

COURSE: URP 535 (INTRO TO URBAN INFORMATICS) FALL 2021

PROFESSOR: Anthony Vanky

PROGRAMS USED: Python via Jupiter Notebooks, Tableau, Yelp API

PROJECT SUMMARY:
This was the Final Project for the course. We were asked to conduct an exploratory analysis of a topic related to the course and apply the programming skills we had obtained over the course of the semester. The main topics explored throughout the semester consisted of smart cities, citizen science, the politics of data, governance structures, and public policies. We also learned how to use Python via Jupiter notebooks to extract or connect to, clean, and analyze data and then create infographics to highlight observations and illustrate points. Later in the semester we also learned how to use tools such as Tableau to aid in the process.

QUESTION:
The city of Los Angeles often touches on the topic of improving the walkability of the city. While actions such as the expansion of the metro system (improving connection between lines) have illustrated the city's efforts, an important factor has yet to be mentioned, public restrooms. I think the new lines that are being developed will significantly increase ridership in the near future once they are complete and with it I suspect there may be an increased need for public restrooms within walking distance of metro stations. With this in mind I was interested in investigating whether or not there were enough public restrooms to meet the demand of those who ride the Los Angeles Metro and if this could be done through the power of Yelp API. To clarify by public restrooms I mean restrooms that are located in businesses that open their restrooms to the public or more likely paying customers.

PROCESS/ DATA USED
Since the city of Los Angeles is roughly 25 miles wide and the max value of the radius using Yelp API is 40,000 meters (roughly 25 miles), I picked three locations in the city that I thought would cover the whole city. Rather than creating a loop that was based on time, I created one that was based on the number of loops it ran each time. In short I created a loop in a loop. The inner loop ran requests for each location and the outer loop decided how many times the inner loop ran. In this process I created several data frames which I later combined into one, duplicates were removed via Python. In regards to using ridership data to determine if the number of restrooms found met the demand, I found data on the LA Metro website. Unfortunately this data was by line and not station which I didn’t think would aid me in answering the question of demand. I did struggle to filter through the businesses by business type on Python, so I did end up having to resort to Tableau for this part of the process. Here is a link to a short video I did for this project that goes over this.

DIAGRAMS
With the data I was able to collect from Yelp through their API, I realized that the term I used in my request would provide me with different businesses, which are illustrated in the above bar chart diagrams. The Unfiltered results showed that most of the businesses were Plumbers or
Contractors. Once these two business types were filtered out that data revealed that Parks service had some of the highest hits. However, a closer examination of the results will reveal that restaurants were divided by food type such as New American or Trada American, if all of these results were combined restaurants would have been more highly ranked. The data also revealed that there may be some patterns in the language different business types use. The term “toilet” was most commonly used among plumbing businesses. While the term “restroom” was more commonly used among restaurants and service stations. Lastly, “bathroom” was most commonly used by contractors, kitchen and bath, and refurbishing businesses.
I also made two maps. The first shows the three locations I used in my requests. The second map highlights all of the train stations with a 5 KM buffer (red and pink) and the public restrooms I found through the Yelp API (black). Both maps have a basic shape of the city of Los Angeles in blue. I tried to include an outline of the surrounding land and sea, but was unable to figure out how to do this in Python within the given time.

CONCLUSION
Firstly, I realized that I did not have enough data on the public restrooms to come to a conclusion. I may have had better results if I had used more than three locations and done more loops. However, it is also possible that more locations may have resulted in more duplicates. I also realized that the ridership data LA Metro shared publicly on their website was not useful when it came to answering the question of whether the number of public restrooms meet the local ridership demand. It would have been more helpful if the data had been by station. After looking at their Public records request website it does seem as though they are able to provide this data upon request, however, this process would have likely taken more time than I had for this project. Lastly, it was interesting to see the varying linguistic patterns business use when discussing restrooms.

FINAL DATA USED
“Yelp API.” Yelp. Accessed December 18, 2021. https://www.yelp.com/.
“Interactive Estimated Ridership Stats.” Metro Ridership. Accessed December 18, 2021. https://isotp.metro.net/MetroRidership/.

↑Back to Top