There's No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets

Crowd-sourcing has become a popular form of computer mediated collaborative work and OpenStreetMap represents one of the most successful crowd-sourcing systems, where the goal of building and maintaining an accurate global map of the world is being accomplished by means of contributions made by over 1.2M citizens. However, within this apparently large crowd, a tiny group of highly active users is responsible for the mapping of almost all the content. One may thus wonder to what extent the information being mapped is biased towards the interests and agenda of this group of users. In this paper, we present a method to quantitatively measure content bias in crowd-sourced geographic information. We then apply the method to quantify content bias across a three-year period of OpenStreetMap mapping in 40 countries. We find almost no content bias in terms of what is being mapped, but significant geographic bias; furthermore, we find that bias in terms of meticulousness varies with culture.

[1]  Panayiotis Zaphiris,et al.  Cultural Differences in Collaborative Authoring of Wikipedia , 2006, J. Comput. Mediat. Commun..

[2]  Monica Stephens Gender and the GeoWeb: divisions in the production of user-generated cartographic information , 2013, GeoJournal.

[3]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[4]  Aaron Halfaker,et al.  Wikipedians are born, not made: a study of power editors on Wikipedia , 2009, GROUP.

[5]  Henriette Cramer,et al.  Representation and communication: challenges in interpreting large social media datasets , 2013, CSCW.

[6]  Michael Luca,et al.  Aggregation of Consumer Ratings: An Application to Yelp.com , 2012 .

[7]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[8]  Aaron Halfaker,et al.  A jury of your peers: quality, experience and ownership in Wikipedia , 2009, Int. Sym. Wikis.

[9]  Giovanni Quattrone,et al.  Mind the map: the impact of culture and economic affluence on crowd-mapping behaviours , 2014, CSCW.

[10]  Eric Gilbert,et al.  Specialization, homophily, and gender in a social curation site: findings from pinterest , 2014, CSCW.

[11]  Amy Bruckman,et al.  Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia , 2005, GROUP.

[12]  Vyron Antoniou,et al.  How Many Volunteers Does it Take to Map an Area Well? The Validity of Linus’ Law to Volunteered Geographic Information , 2010 .

[13]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[14]  Giovanni Quattrone,et al.  The Life of the Party: Impact of Social Mapping in OpenStreetMap , 2013, ICWSM.

[15]  Pascal Neis,et al.  Quality assessment for building footprints data on OpenStreetMap , 2014, Int. J. Geogr. Inf. Sci..

[16]  Kazunari Ishida Geographical Bias on Social Media and Geo-local Contents System with Mobile Devices , 2012, 2012 45th Hawaii International Conference on System Sciences.

[17]  Timothy Savage,et al.  Cross-cultural analysis in online community research: A literature review , 2013, Comput. Hum. Behav..

[18]  Virgílio A. F. Almeida,et al.  Tips, dones and todos: uncovering user profiles in foursquare , 2012, WSDM '12.

[19]  Jonathan T. Morgan,et al.  The Rise and Decline of an Open Collaboration System , 2013 .

[20]  Brent J. Hecht,et al.  A Tale of Cities: Urban Biases in Volunteered Geographic Information , 2014, ICWSM.

[21]  R. Bhagat Culture's Consequences: Comparing Values, Behaviors, Institutions, and Organizations Across Nations , 2002 .

[22]  Guillaume Touya,et al.  Quality Assessment of the French OpenStreetMap Dataset , 2010, Trans. GIS.

[23]  Katharina Reinecke,et al.  Doodle around the world: online scheduling behavior reflects cultural differences in time perception and group decision-making , 2013, CSCW.

[24]  Jamal Jokar Arsanjani Assessing the Quality of OpenStreetMap Contributors together with their Contributions , 2013 .

[25]  Mikhil Masli,et al.  Eliciting and focusing geographic volunteer work , 2010, CSCW '10.

[26]  Ee-Peng Lim,et al.  Measuring article quality in wikipedia: models and evaluation , 2007, CIKM '07.

[27]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[28]  Giovanni Quattrone,et al.  On the accuracy of urban crowd-sourcing for maintaining large-scale geospatial databases , 2012, WikiSym '12.

[29]  Susan C. Herring,et al.  Cultural bias in Wikipedia content on famous persons , 2011, J. Assoc. Inf. Sci. Technol..

[30]  Daniele Quercia,et al.  Cultural Dimensions in Twitter: Time, Individualism and Power , 2013, ICWSM.

[31]  Michael Luca,et al.  Aggregation of Consumer Ratings: An Application to Yelp.com , 2012 .

[32]  Pascal Neis,et al.  Comparison of Volunteered Geographic Information Data Contributions and Community Development for Selected World Regions , 2013, Future Internet.

[33]  A. Zipf,et al.  A Comparative Study of Proprietary Geodata and Volunteered Geographic Information for Germany , 2010 .

[34]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[35]  John Riedl,et al.  Creating, destroying, and restoring value in wikipedia , 2007, GROUP.

[36]  John Riedl,et al.  WP:clubhouse?: an exploration of Wikipedia's gender imbalance , 2011, Int. Sym. Wikis.

[37]  Bryan A. Pendleton,et al.  Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie , 2006 .

[38]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[39]  Angi Voß,et al.  A Comparison of the Street Networks of Navteq and OSM in Germany , 2011, AGILE Conf..

[40]  Mark Gahegan,et al.  Visualizing Geospatial Information Uncertainty: What We Know and What We Need to Know , 2005 .

[41]  Pascal Neis,et al.  Analyzing the Contributor Activity of a Volunteered Geographic Information Project - The Case of OpenStreetMap , 2012, ISPRS Int. J. Geo Inf..