Mitigating Bias in Big Data for Transportation

Emerging big data resources and practices provide opportunities to improve transportation safety planning and outcomes. However, researchers and practitioners recognise that big data from mobile phones, social media, and on-board vehicle systems include biases in representation and accuracy, related to transportation safety statistics. This study examines both the sources of bias and approaches to mitigate them through a review of published studies and interviews with experts. Coding of qualitative data enabled topical comparisons and reliability metrics. Results identify four categories of bias and mitigation approaches that concern transportation researchers and practitioners: sampling, measurement, demographics, and aggregation. This structure for understanding and working with bias in big data supports research with practical approaches for rapidly evolving transportation data sources.

[1]  Bilal Farooq,et al.  A Perspective on the Challenges and Opportunities for Privacy-Aware Big Transportation Data , 2018, Journal of Big Data Analytics in Transportation.

[2]  Nader Afzalan,et al.  09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0: Four Reasons Why AICP Needs an Open Data Ethic , 2017 .

[3]  Michael Batty,et al.  Big Data and the City , 2016 .

[4]  F. Turna,et al.  Rule extraction for tram faults via data mining for safe transportation , 2018, Transportation Research Part A: Policy and Practice.

[5]  Yi Zhu,et al.  Inferring individual daily activities from mobile phone traces: A Boston example , 2016 .

[6]  L. Taylor No place to hide? The ethics and analytics of tracking mobility using mobile phone data , 2016 .

[7]  Greg P. Griffin,et al.  Crowdsourcing Bicycle Volumes: Exploring the role of volunteered geographic information and established monitoring methods , 2014 .

[8]  Akshay Vij,et al.  When is big data big enough? Implications of using GPS-based surveys for travel demand analysis , 2015 .

[9]  Le Minh Kieu,et al.  Passenger Segmentation Using Smart Card Data , 2015, IEEE Transactions on Intelligent Transportation Systems.

[10]  Qing He,et al.  Social Media in Transportation Research and Promising Applications , 2018, Complex Networks and Dynamic Systems.

[11]  M. Batty Inventing Future Cities , 2018 .

[12]  Rob Kitchin,et al.  Improving the Veracity of Open and Real-Time Urban Data , 2016 .

[13]  Kara M. Kockelman,et al.  The Travel and Environmental Implications of Shared Autonomous Vehicles, Using Agent-Based Model Scenarios , 2014 .

[14]  Lisa Schweitzer,et al.  Planning and Social Media: A Case Study of Public Transit and Stigma on Twitter , 2014 .

[15]  C. Curtis,et al.  Planning the driverless city , 2019 .

[16]  Ralph Buehler Can Public Transportation Compete with Automated and Connected Cars , 2018 .

[17]  Mario Callegaro,et al.  Social media in public opinion research: Report of the AAPOR task force on emerging technologies in public opinion research , 2014 .

[18]  Travis Crayton,et al.  Autonomous Vehicles: Developing a Public Health Research Agenda to Frame the Future of Transportation Policy , 2017 .

[19]  Yu Liu,et al.  The promises of big data and small data for travel behavior (aka human mobility) analysis , 2016, Transportation research. Part C, Emerging technologies.

[20]  Meghan Winters,et al.  Using crowdsourced data to monitor change in spatial patterns of bicycle ridership , 2018, Journal of Transport & Health.

[21]  Juha Oksanen,et al.  Conflation of OpenStreetMap and Mobile Sports Tracking Data for Automatic Bicycle Routing , 2016, Trans. GIS.

[22]  Henrietta O'Connor,et al.  Internet based interviewing , 2016 .

[23]  Sirisha Kothuri,et al.  Monitoring Bicyclist and Pedestrian Travel and Behavior , 2014 .

[24]  María Henar Salas-Olmedo,et al.  Exploring the potential of mobile phone records and online route planners for dynamic accessibility analysis , 2018, Transportation Research Part A: Policy and Practice.

[25]  Eric A. Morris,et al.  Cities, Automation, and the Self-parking Elephant in the Room , 2018 .

[26]  Patrick Bonnel,et al.  Workshop Synthesis: Comparing and Combining Survey Modes☆ , 2015 .

[27]  Naixia Mou,et al.  Modelling the competitiveness of the ports along the Maritime Silk Road with big data , 2018, Transportation Research Part A: Policy and Practice.

[28]  Mohamed Abdel-Aty,et al.  Geographical unit based analysis in the context of transportation safety planning , 2013 .

[29]  Solveig Osborg Ose,et al.  Using Excel and Word to Structure Qualitative Data , 2016 .

[30]  Kathleen M. MacQueen,et al.  Applied Thematic Analysis , 2011 .

[31]  Hong Yang,et al.  Use of real-world connected vehicle data in identifying high-risk locations based on a new surrogate safety measure. , 2019, Accident; analysis and prevention.

[32]  Nicolaus Henke,et al.  The age of analytics: competing in a data-driven world , 2016 .

[33]  Xiqun Chen,et al.  Understanding ridesplitting behavior of on-demand ride services: An ensemble learning approach , 2017 .

[34]  Andrew Mondschein,et al.  Five-star transportation: using online activity reviews to examine mode choice to non-work destinations , 2015 .

[35]  Kevin C. Desouza,et al.  Urban Informatics: Critical Data and Technology Considerations , 2017 .

[36]  Juha Oksanen,et al.  Estimating the Biasing Effect of Behavioural Patterns on Mobile Fitness App Data by Density-Based Clustering , 2016, AGILE Conf..

[37]  Greg P. Griffin,et al.  Where Does Bicycling for Health Happen? Analysing Volunteered Geographic Information Through Place and Plexus , 2015 .

[38]  Adam Joinson,et al.  The use of self-monitoring solutions amongst cyclists: An online survey and empirical study , 2015 .

[39]  Marta C. González,et al.  The path most traveled: Travel demand estimation using big data resources , 2015, Transportation Research Part C: Emerging Technologies.

[40]  Noise and the city: Leveraging crowdsourced big data to examine the spatio-temporal relationship between urban development and noise annoyance , 2019, Environment and Planning B: Urban Analytics and City Science.

[41]  Richard Shearmur,et al.  Dazzled by data: Big Data, the census and urban geography , 2015 .

[42]  Y. Yue,et al.  A commuting spectrum analysis of the jobs–housing balance and self-containment of employment with mobile phone location big data , 2018 .

[43]  Antonio Gschwender,et al.  Using smart card and GPS data for policy and planning: The case of Transantiago , 2016 .

[44]  M. Kwan The Uncertain Geographic Context Problem , 2012 .

[45]  Chengcheng Xu,et al.  Incorporating twitter-based human activity information in spatial analysis of crashes in urban areas. , 2017, Accident; analysis and prevention.

[46]  Y. Yue,et al.  Workplace segregation of rural migrants in urban China: A case study of Shenzhen using cellphone big data , 2019, Environment and Planning B: Urban Analytics and City Science.

[47]  Jennifer S. Evans-Cowley,et al.  Microparticipation with Social Media for Community Engagement in Transportation Planning , 2012 .

[48]  J. Wood,et al.  Exploring gendered cycling behaviours within a large-scale behavioural data-set , 2014 .

[49]  Nidhi Kalra,et al.  Updating Regional Transportation Planning and Modeling Tools to Address Impacts of Connected and Automated Vehicles, Volume 2: Guidance , 2018 .

[50]  Klaus Krippendorff,et al.  The Reliability of Multi-Valued Coding of Data , 2016 .

[51]  Jason Hong,et al.  Using User-Generated Content to Understand Cities , 2017 .

[52]  Tom W. Smith,et al.  Big Data and Survey Research: Supplement or Substitute? , 2017 .

[53]  William Roth Smith,et al.  Communication, Sportsmanship, and Negotiating Ethical Conduct on the Digital Playing Field , 2017 .

[54]  K. Crawford The Hidden Biases in Big Data , 2013 .

[56]  Thomas M Brennan,et al.  Visualizing and Evaluating Interdependent Regional Traffic Congestion and System Resiliency, a Case Study Using Big Data from Probe Vehicles , 2019, Journal of Big Data Analytics in Transportation.

[57]  Sirisha Kothuri,et al.  Monitoring Bicyclist and Pedestrian Travel and Behavior: Current Research and Practice , 2014 .

[58]  Robert Pless,et al.  Learning from Outdoor Webcams: Surveillance of Physical Activity Across Environments , 2016 .

[59]  Chunming Qiao,et al.  Data fusion with flexible message composition in Driver-in-the-Loop vehicular CPS , 2013, Ad Hoc Networks.

[60]  M. Charlton,et al.  More bark than bytes? Reflections on 21+ years of geocomputation , 2017 .

[61]  Eleni I. Vlahogianni,et al.  Transportation Mode Detection from Low-Power Smartphone Sensors Using Tree-Based Ensembles , 2019 .

[62]  Jinbao Zhao,et al.  Weather and cycling: Mining big data to have an in-depth understanding of the association of weather variability with cycling on an off-road trail and an on-road bike lane , 2018 .

[63]  William C. Adams,et al.  Conducting Semi‐Structured Interviews , 2015 .

[64]  Philip S. Yu,et al.  Transportation mode detection using mobile phones and GIS information , 2011, GIS.

[65]  Rashid Mehmood,et al.  Exploring the influence of big data on city transport operations: a Markovian approach , 2017 .