The Impact of Biases in the Crowdsourced Trajectories on the Output of Data Mining Processes

The emergence of the Geoweb has provided an unprecedented capacity for generating and sharing digital content by professional and non- professional participants in the form of crowdsourcing projects, such as OpenStreetMap (OSM) or Wikimapia. Despite the success of such projects, the impacts of the inherent biases within the ‘crowd’ and/or the ‘crowdsourced’ data it produces are not well explored. In this paper we examine the impact of biased trajectory data on the output of spatio-temporal data mining process. To do so, an experiment was conducted. The biases are intentionally added to the input data; i.e. the input trajectories were divided into two sets of training and control datasets but not randomly (as opposed to the data mining procedures). They are divided by time of day and week, weather conditions, contributors’ gender and spatial and temporal density of trajectory in 1km grids. The accuracy of the predictive models are then measured (both for training and control data) and biases gradually moderated to see how the accuracy of the very same model is changing with respect to the biased input data. We show that the same data mining technique yields different results in terms of the nature of the clusters and identified attributes.

[1]  Alex Singleton,et al.  Web mapping 2.0: The neogeography of the GeoWeb , 2008 .

[2]  Greg Brown,et al.  A Review of Sampling Effects and Response Bias in Internet Participatory Mapping (PPGIS/PGIS/VGI) , 2017, Trans. GIS.

[3]  Adam C. Winstanley,et al.  Quality assessment of OpenStreetMap data using trajectory mining , 2016, Geo spatial Inf. Sci..

[4]  T. DeMaio Refusals: Who, Where and Why , 1980 .

[5]  Boualem Benatallah,et al.  Quality Control in Crowdsourcing , 2018, ACM Comput. Surv..

[6]  A. Dicker,et al.  Patient-oriented cancer information on the internet: a comparison of wikipedia and a professionally maintained database. , 2011, Journal of oncology practice.

[7]  Li Shi,et al.  Inferring spatial interaction patterns from sequential snapshots of spatial distributions , 2018, Int. J. Geogr. Inf. Sci..

[8]  William Samuelson,et al.  Status quo bias in decision making , 1988 .

[9]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Eric Horvitz,et al.  Addressing bias in machine learning algorithms: A pilot study on emotion recognition for intelligent systems , 2017, 2017 IEEE Workshop on Advanced Robotics and its Social Impacts (ARSO).

[11]  Z. Gardner Investigating gender differences in OpenStreetMap activities in Malawi: a small case-study , 2018 .

[12]  L. James,et al.  Estimating within-group interrater reliability with and without response bias. , 1984 .

[13]  Marcia L. Spetch,et al.  Remembering the best and worst of times: Memories for extreme outcomes bias risky decisions , 2013, Psychonomic Bulletin & Review.

[14]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[15]  D. Cornford,et al.  How good are citizen weather stations? Addressing a biased opinion , 2015 .

[16]  K. Fiedler,et al.  A sampling approach to biases in conditional probability judgments: beyond base rate neglect and statistical format. , 2000, Journal of experimental psychology. General.

[17]  Raymond J. Mooney,et al.  Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning , 1996, EMNLP.

[18]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[19]  Kenneth R. Hammond,et al.  Coherence and correspondence theories in judgment and decision making. , 2000 .

[20]  D. Kahneman,et al.  Anomalies: The Endowment Effect, Loss Aversion, and Status Quo Bias , 1991 .

[21]  Steffen Fritz,et al.  Local Knowledge and Professional Background Have a Minimal Impact on Volunteer Citizen Science Performance in a Land-Cover Classification Task , 2016, Remote. Sens..

[22]  J. Goyder,et al.  Surveys on Surveys: Limitations and Potentialities , 1986 .

[23]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[24]  Peter Mooney,et al.  Using Crowdsourced Trajectories for Automated OSM Data Entry Approach , 2016, Sensors.

[25]  J. Kruger Lake Wobegon be gone! The "below-average effect" and the egocentric nature of comparative ability judgments. , 1999, Journal of personality and social psychology.

[26]  Katherine L. Milkman,et al.  A User's Guide to Debiasing , 2014 .

[27]  M. Zuckerman,et al.  Beyond selecting information: Biases in spontaneous questions and resultant conclusions. , 1993 .

[28]  David Dunning,et al.  Hypocognition: Making Sense of the Landscape beyond One's Conceptual Reach , 2018 .

[29]  Pamela J. Hinds,et al.  The curse of expertise: The effects of expertise and debiasing methods on prediction of novice performance. , 1999 .

[30]  Pascal Neis,et al.  Towards Automatic Vandalism Detection in OpenStreetMap , 2012, ISPRS Int. J. Geo Inf..

[31]  R. Nickerson Confirmation Bias: A Ubiquitous Phenomenon in Many Guises , 1998 .

[32]  S. Bikhchandani,et al.  You have printed the following article : A Theory of Fads , Fashion , Custom , and Cultural Change as Informational Cascades , 2007 .

[33]  Filippo Menczer,et al.  How algorithmic popularity bias hinders or promotes quality , 2017, Scientific Reports.

[34]  Steven P. Jackson,et al.  Assessing the impact of demographic characteristics on spatial error in volunteered geographic information features , 2015 .

[35]  Brian A. Nosek,et al.  Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention , 2014, Trends in Cognitive Sciences.

[36]  Carmine Zoccali,et al.  Selection Bias and Information Bias in Clinical Research , 2010, Nephron Clinical Practice.

[37]  Craig R. M. McKenzie,et al.  Rational models as theories – not standards – of behavior , 2003, Trends in Cognitive Sciences.