The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework

This research explores the relationship between daily air quality indicator (AQI) values and the daily intensity of bike-share ridership in New York City. The authors designed and deployed a distributed data science framework on which to process and run Elastic Net, Random Forest Regression, and Gradient Boosted Regression Trees. Nine gigabytes of CitiBike ridership data, along with one gigabyte of air quality indicator (AQI) data were employed. All machine learning algorithms identified bike-share ridership intensity as either the most important or the second most important feature in predicting future daily AQIs. The authors also empirically demonstrated that although a distributed platform was necessary to ingest and pre-process the raw 10 gigabytes of data, the actual execution time of all three machine learning algorithms on cleaned, joined, and aggregated data was far faster on a local, commodity computer than on its distributed counterpart.

[1]  Zenon Chaczko,et al.  Availability and Load Balancing in Cloud Computing , 2011 .

[2]  Simon Washington,et al.  Factors influencing bike share membership : an analysis of Melbourne and Brisbane , 2015 .

[3]  Dimitrios Zissis,et al.  Addressing cloud computing security issues , 2012, Future Gener. Comput. Syst..

[4]  Sarah M Kaufman,et al.  Citi Bike: The First Two Years , 2015 .

[5]  刘义颖,et al.  Amazon Web Services(AWS)云平台可靠性技术研究 , 2014 .

[6]  J. Sallis,et al.  Many Pathways from Land Use to Health: Associations between Neighborhood Walkability and Active Transportation, Body Mass Index, and Air Quality , 2006 .

[7]  Jessica Schoner,et al.  Modeling Bike Share Station Activity: Effects of Nearby Businesses and Jobs on Trips to and from Stations , 2016, 2207.10577.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Ling-Yun He,et al.  Bike Sharing and the Economy, the Environment, and Health-Related Externalities , 2018 .

[10]  S. Guttikunda,et al.  Nature of air pollution, emission sources, and management in the Indian cities , 2014 .

[11]  Borko Furht,et al.  Handbook of Cloud Computing , 2010 .

[12]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[14]  Rose Dewar What can I do to help , 2011 .

[15]  H. Akbari,et al.  Cool surfaces and shade trees to reduce energy use and improve air quality in urban areas , 2001 .

[16]  Carlo Curino,et al.  Schema Evolution in Wikipedia - Toward a Web Information System Benchmark , 2008, ICEIS.

[17]  R. Noland,et al.  The impact of weather conditions on bikeshare trips in Washington, DC , 2014 .

[18]  Simon Washington,et al.  Bike share's impact on car use: evidence from the United States, Great Britain, and Australia , 2014 .

[19]  Scott W. Ambler,et al.  Mapping Objects To Relational Databases , 2000 .

[20]  Shao Ling,et al.  The Effect of Transportation Policies on Energy Consumption and Greenhouse Gas Emission from Urban Passenger Transportation , 2011 .

[21]  Kristina Chodorow,et al.  MongoDB - The Definitive Guide: Powerful and Scalable Data Storage , 2019 .

[22]  Lei Gu,et al.  Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[23]  D. Rojas-Rueda,et al.  Replacing car trips by increasing bike and public transport in the greater Barcelona metropolitan area: a health impact assessment study. , 2012, Environment international.

[24]  Mingshu Wang,et al.  Bike-sharing systems and congestion: Evidence from US cities , 2017 .