论文信息 - Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision

Mining user-generated geographic content: an interactive, crowdsourced approach to validation and supervision

This paper describes a pilot study that implements a novel approach to validate data mining tasks by using the crowd to train a classifier. This hybrid approach to processing successfully addresses challenges faced during human curation or machine processing of user-generated geographic content (UGGC), namely quality control, reproducibility, sustainability, scaling, data quality, overfitting, and training costs. We test the approach on mining UGGC to derive information on local places as humans perceive them. Specifically, we retrieve Flickr image metadata, enrich it semantically by building term vectors using a controlled vocabulary, cluster it spatially, let online participants rate those clusters, classify them into noise and places by using both semantic and cluster characteristics, let online participants supervise the classification by annotating the results, and use their feedback to improve clustering and revise the trained model. The results show that the approach is feasible and suggest future studies to improve it, while also indicating that mining places from UGGC requires more than a single source.

[1] Billy Haworth,et al. Emergency management perspectives on volunteered geographic information: Opportunities, challenges and change , 2016, Comput. Environ. Urban Syst..

[2] Jo Wood,et al. Describing place through user generated content , 2011, First Monday.

[3] P. Rousseeuw. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[4] Scott Freundschuh,et al. Assessing uncertainty in VGI for emergency response , 2014 .

[5] Derya Birant,et al. ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[6] Carlos Granell,et al. Beyond data collection: Objectives and methods of research using VGI and geo-social media for disaster management , 2016, Comput. Environ. Urban Syst..

[7] Claus Rinner,et al. A Systems Perspective on Volunteered Geographic Information , 2014, ISPRS Int. J. Geo Inf..

[8] Declan Butler,et al. When Google got flu wrong , 2013, Nature.

[9] Frank O. Ostermann,et al. Digital Earth from vision to practice: making sense of citizen-generated content , 2012, Int. J. Digit. Earth.

[10] F. Ostermann,et al. Automated geographic context analysis for volunteered information , 2013 .