The impact of privacy protection measures on the utility of crowdsourced cycling data

Abstract The use of new forms of data in the transport research domain is rapidly gaining popularity. However, these data come with specific challenges and one of the major concerns is maintaining the privacy of data subjects. One widely used approach to anonymise the data is to apply binning. Recently, data from activity-tracking applications like Strava has been utilised to study and analyse active travel. Due to privacy concerns, Strava has started providing data in a discretised format from July 2018. In this study, we aim to analyse the impact of the binning criteria on the utility of the crowdsourced data by using Strava data from 2013 to 2016 for the city of Glasgow. We applied the Strava binning criteria on the original dataset at three different temporal aggregations (i.e., Hourly, Daily and Monthly) and conducted different analyses to examine its impacts. First, we compared manual cycling counts with original and binned cycling counts from Strava data. Second, net-errors were calculated by comparing original and binned cycling counts from Strava data. Third, we estimated spatial autocorrelation statistics based on original and binned Strava counts and investigated the extent to which research outcomes change because of the binning approach. Our results confirmed significant amount of information loss. Worryingly, we also show that conclusions reached by previous studies could have been reversed if the new specification of the data had been used. We outline here what precautions researchers and planners should take when working with the binned data.

[1]  C. Pickering,et al.  What can volunteered geographic information tell us about the different ways mountain bikers, runners and walkers use urban reserves? , 2019, Landscape and Urban Planning.

[2]  F. Racioppi,et al.  Economic analyses of transport infrastructure and policies including health effects related to cycling and walking: A systematic review , 2008 .

[3]  Hartwig H. Hochmair,et al.  Estimating bicycle trip volume for Miami-Dade county from Strava tracking data , 2019, Journal of Transport Geography.

[4]  A. Woodward,et al.  A Cost Benefit Analysis of an Active Travel Intervention with Health and Carbon Emission Reduction Benefits , 2018, International journal of environmental research and public health.

[5]  A. Bauman,et al.  Health benefits of cycling: a systematic review , 2011, Scandinavian journal of medicine & science in sports.

[6]  B. Taylor,et al.  A private matter: the implications of privacy regulations for intelligent transportation systems , 2016 .

[7]  Omer Tene Jules Polonetsky,et al.  Privacy in the Age of Big Data: A Time for Big Decisions , 2012 .

[8]  Ira S. Rubinstein,et al.  Big Data: The End of Privacy or a New Beginning? , 2013 .

[9]  John R. Roy,et al.  Minimizing Information Loss in Simple Aggregation , 1982 .

[10]  Daniel Orellana,et al.  Exploring the influence of road network structure on the spatial behaviour of cyclists using crowdsourced data , 2019, Environment and Planning B: Urban Analytics and City Science.

[11]  Kevin J. Krizek,et al.  Commuter Bicyclist Behavior and Facility Disruption , 2007 .

[12]  C. Rissel,et al.  Active travel: a climate change mitigation strategy with co-benefits for health. , 2009, New South Wales public health bulletin.

[13]  Greg P. Griffin,et al.  Where Does Bicycling for Health Happen? Analysing Volunteered Geographic Information Through Place and Plexus , 2015 .

[14]  Seema Bawa,et al.  Distributed and Big Data Storage Management in Grid Computing , 2012, Grid 2012.

[15]  O. Franco,et al.  Public health benefits of strategies to reduce greenhouse-gas emissions: urban land transport , 2009, The Lancet.

[17]  Nuria Oliver,et al.  Sensing and predicting the pulse of the city through shared bicycling , 2009, IJCAI 2009.

[18]  Priyanka Alluri,et al.  Spatial analysis of macro-level bicycle crashes using the class of conditional autoregressive models. , 2018, Accident; analysis and prevention.

[19]  Christopher Pettit,et al.  Comparing spatial patterns of crowdsourced and conventional bicycling datasets , 2018 .

[20]  Luc Int Panis,et al.  Health impact assessment of active transportation: A systematic review. , 2015, Preventive medicine.

[21]  Chen Chen,et al.  How bicycle level of traffic stress correlate with reported cyclist accidents injury severities: A geospatial and mixed logit analysis. , 2017, Accident; analysis and prevention.

[22]  David Philip McArthur,et al.  Visualising where commuting cyclists travel using crowdsourced data , 2019, Journal of Transport Geography.

[23]  Zhen Lin,et al.  Using binning to maintain confidentiality of medical data , 2002, AMIA.

[24]  Dick Ettema,et al.  Big Data and Cycling , 2016 .

[25]  T. Nelson,et al.  Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists , 2019, Urban Science.

[26]  Jinhyun Hong,et al.  The evaluation of large cycling infrastructure investments in Glasgow using crowdsourced cycle data , 2019, Transportation.

[27]  Meghan Winters,et al.  Using crowdsourced data to monitor change in spatial patterns of bicycle ridership , 2018, Journal of Transport & Health.

[28]  Frank R. Proulx,et al.  Bicycle Traffic Volume Estimation using Geographically Weighted Data Fusion , 2017 .

[29]  Andreas Keler,et al.  Safety-aware routing for motorised tourists based on open data and VGI , 2016, J. Locat. Based Serv..

[30]  Emiliano Miluzzo,et al.  BikeNet: A mobile sensing system for cyclist experience mapping , 2009, TOSN.

[31]  H. Nijland,et al.  Do the Health Benefits of Cycling Outweigh the Risks? , 2010, Environmental health perspectives.

[32]  Chen Chen,et al.  Development of a Crash Risk-Scoring Tool for Pedestrian and Bicycle Projects in Oregon , 2018 .

[33]  Robert J. Schneider,et al.  Ballpark Method for Estimating Pedestrian and Bicyclist Exposure in Seattle, Washington , 2017 .

[34]  Rob Kitchin,et al.  Getting smarter about smart cities: Improving data privacy and data security , 2016 .

[35]  Peter R. Stopher,et al.  Deducing mode and purpose from GPS data , 2008 .

[36]  J. Bartram,et al.  Health Risk Perceptions Are Associated with Domestic Use of Basic Water and Sanitation Services—Evidence from Rural Ethiopia , 2018, International journal of environmental research and public health.

[37]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[38]  Jaeyoung Lee,et al.  Bicycle Safety Analysis at Intersections from Crowdsourced Data , 2019, Transportation Research Record: Journal of the Transportation Research Board.

[39]  Trisalyn A. Nelson,et al.  Mapping ridership using crowdsourced cycling data , 2016 .

[40]  J. Pell,et al.  Association of injury related hospital admissions with commuting by bicycle in the UK: prospective population based study , 2020, BMJ.

[41]  Bret W. Butler,et al.  Using crowdsourced fitness tracker data to model the relationship between slope and travel rates , 2019, Applied Geography.

[42]  Mohamed El Esawey,et al.  Estimation of daily bicycle traffic volumes using sparse data , 2015, Comput. Environ. Urban Syst..

[43]  M. Padgham Human Movement Is Both Diffusive and Directed , 2012, PloS one.

[44]  Marco Soave,et al.  Collaboratively collected geodata to support routing service for disabled people , 2014 .

[45]  Marialaura Di Domenico,et al.  Big Data: A Normal Accident Waiting to Happen? , 2015, Journal of Business Ethics.

[46]  A. Mobasheri,et al.  Utilizing Crowdsourced Data for Studies of Cycling and Air Pollution Exposure: A Case Study Using Strava Data , 2017, International journal of environmental research and public health.

[47]  Paul Welsh,et al.  Association between active commuting and incident cardiovascular disease, cancer, and mortality: prospective cohort study , 2017, British Medical Journal.

[48]  Ipek N Sener,et al.  Understanding Potential Exposure of Bicyclists on Roadways to Traffic-Related Air Pollution: Findings from El Paso, Texas, Using Strava Metro Data , 2019, International journal of environmental research and public health.

[49]  James Haworth Investigating The Potential Of Activity Tracking App Data To Estimate Cycle Flows In Urban Areas , 2016 .

[50]  Jennifer S. Mindell Active travel is (generally) good for health, the environment and the economy , 2015 .

[51]  J. V. van Dijl,et al.  Definition of the σW Regulon of Bacillus subtilis in the Absence of Stress , 2012, PloS one.