OSMWatchman: Learning How to Detect Vandalized Contributions in OSM Using a Random Forest Classifier

Though Volunteered Geographic Information (VGI) has the advantage of providing free open spatial data, it is prone to vandalism, which may heavily decrease the quality of these data. Therefore, detecting vandalism in VGI may constitute a first way of assessing the data in order to improve their quality. This article explores the ability of supervised machine learning approaches to detect vandalism in OpenStreetMap (OSM) in an automated way. For this purpose, our work includes the construction of a corpus of vandalism data, given that no OSM vandalism corpus is available so far. Then, we investigate the ability of random forest methods to detect vandalism on the created corpus. Experimental results show that random forest classifiers perform well in detecting vandalism in the same geographical regions that were used for training the model and has more issues with vandalism detection in “unfamiliar regions”.

[1]  Rodolphe Devillers,et al.  The life cycle of contributors in collaborative online communities -the case of OpenStreetMap , 2018, Int. J. Geogr. Inf. Sci..

[2]  Peter Mooney,et al.  The Annotation Process in OpenStreetMap , 2012, Trans. GIS.

[3]  Michael F. Goodchild,et al.  Assuring the quality of volunteered geographic information , 2012 .

[4]  Guillaume Touya,et al.  Building Social Networks in Volunteered Geographic Information Communities: What Contributor Behaviours Reveal About Crowdsourced Data Quality , 2017, COSIT.

[5]  Gavin McArdle,et al.  A transfer learning paradigm for spatial networks , 2019, SAC.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Carsten Keßler,et al.  Trust as a Proxy Measure for the Quality of Volunteered Geographic Information in the Case of OpenStreetMap , 2013, AGILE Conf..

[8]  Benno Stein,et al.  Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis , 2015, SIGIR.

[9]  Ning Jing,et al.  Amateur or Professional: Assessing the Expertise of Major Contributors in OpenStreetMap Based on Contributing Behaviors , 2016, ISPRS Int. J. Geo Inf..

[10]  Michele Melchiori,et al.  A PageRank-based Reputation Model for VGI Data , 2016, EUSPN/ICTH.

[11]  Chun How Tan,et al.  Trust, but verify: predicting contribution quality for knowledge base construction and curation , 2014, WSDM.

[12]  Christoph Schlieder,et al.  Spatial Collaboration Networks of OpenStreetMap , 2015, OpenStreetMap in GIScience.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  Andrea Ballatore,et al.  Defacing the Map: Cartographic Vandalism in the Digital Commons , 2014, ArXiv.

[15]  Guillaume Touya,et al.  Detecting Level-of-Detail Inconsistencies in Volunteered Geographic Information Data Sets , 2013, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[16]  Pascal Neis,et al.  Analyzing the Contributor Activity of a Volunteered Geographic Information Project - The Case of OpenStreetMap , 2012, ISPRS Int. J. Geo Inf..

[17]  Guillaume Touya,et al.  Is deep learning the new agent for map generalization? , 2019, International Journal of Cartography.

[18]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[19]  Padmini Srinivasan,et al.  Detecting Wikipedia vandalism with active learning and statistical language models , 2010, WICOW '10.

[20]  Benno Stein,et al.  Vandalism Detection in Wikidata , 2016, CIKM.

[21]  Guillaume Touya,et al.  Analysis of collaboration networks in OpenStreetMap through weighted social multigraph mining , 2018, Int. J. Geogr. Inf. Sci..

[22]  Pascal Neis,et al.  Towards Automatic Vandalism Detection in OpenStreetMap , 2012, ISPRS Int. J. Geo Inf..

[23]  Qingjie Liu,et al.  Road Extraction by Deep Residual U-Net , 2017, IEEE Geoscience and Remote Sensing Letters.

[24]  Aaron Halfaker,et al.  Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work , 2011, Int. Sym. Wikis.

[25]  Yongyang Xu,et al.  Quality assessment of building footprint data using a deep autoencoder network , 2017, Int. J. Geogr. Inf. Sci..

[26]  Guillaume Touya,et al.  Towards Vandalism Detection in OpenStreetMap Through a Data Driven Approach (Short Paper) , 2018, GIScience.

[27]  Martin Potthast,et al.  Crowdsourcing a wikipedia vandalism corpus , 2010, SIGIR.

[28]  Levente Juhasz,et al.  Cartographic Vandalism in the Era of Location-Based Games - The Case of OpenStreetMap and Pokémon GO , 2020, ISPRS Int. J. Geo Inf..

[29]  Yongyang Xu,et al.  Building Extraction in Very High Resolution Remote Sensing Imagery Using Deep Learning and Guided Filters , 2018, Remote. Sens..