Using volunteered geographic information (VGI) in design-based statistical inference for area estimation and accuracy assessment of land cover

Volunteered Geographic Information (VGI) offers a potentially inexpensive source of reference data for estimating area and assessing map accuracy in the context of remote-sensing based land-cover monitoring. The quality of observations from VGI and the typical lack of an underlying probability sampling design raise concerns regarding use of VGI in widely-applied design-based statistical inference. This article focuses on the fundamental issue of sampling design used to acquire VGI. Design-based inference requires the sample data to be obtained via a probability sampling design. Options for incorporating VGI within design-based inference include: 1) directing volunteers to obtain data for locations selected by a probability sampling design; 2) treating VGI data as a “certainty stratum” and augmenting the VGI with data obtained from a probability sample; and 3) using VGI to create an auxiliary variable that is then used in a model-assisted estimator to reduce the standard error of an estimate produced from a probability sample. The latter two options can be implemented using VGI data that were obtained from a non-probability sampling design, but require additional sample data to be acquired via a probability sampling design. If the only data available are VGI obtained from a non-probability sample, properties of design-based inference that are ensured by probability sampling must be replaced by assumptions that may be difficult to verify. For example, pseudo-estimation weights can be constructed that mimic weights used in stratified sampling estimators. However, accuracy and area estimates produced using these pseudo-weights still require the VGI data to be representative of the full population, a property known as “external validity”. Because design-based inference requires a probability sampling design, directing volunteers to locations specified by a probability sampling design is the most straightforward option for use of VGI in design-based inference. Combining VGI from a non-probability sample with data from a probability sample using the certainty stratum approach or the model-assisted approach are viable alternatives that meet the conditions required for design-based inference and use the VGI data to advantage to reduce standard errors.

[1]  M. Goodchild,et al.  Prospects for VGI Research and the Emerging Fourth Paradigm , 2013 .

[2]  Steven P. Jackson,et al.  Assessing the impact of demographic characteristics on spatial error in volunteered geographic information features , 2015 .

[3]  D. H. Card Using known map category marginal frequencies to improve estimates of thematic map accuracy , 1982 .

[4]  G. Foody Assessing the accuracy of land cover change with imperfect ground reference data , 2010 .

[5]  Richard Valliant,et al.  Finite population sampling and inference : a prediction approach , 2000 .

[6]  Steffen Fritz,et al.  Downgrading recent estimates of land available for biofuel production. , 2013, Environmental science & technology.

[7]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[8]  Carl F. Salk,et al.  Comparing the Quality of Crowdsourced Data Contributed by Expert and Non-Experts , 2013, PloS one.

[9]  R. McRoberts,et al.  Using the regression estimator with Landsat data to estimate proportion forest cover and net proportion deforestation in Gabon , 2014 .

[10]  C. Woodcock,et al.  Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation , 2013 .

[11]  Michael R. Elliott,et al.  Inference for Nonprobability Samples , 2017 .

[12]  Ronald E. McRoberts,et al.  Satellite image-based maps: Scientific inference or pretty pictures? , 2011 .

[13]  Jamal Jokar Arsanjani,et al.  Understanding the potential relationship between the socio-economic variables and contributions to OpenStreetMap , 2015, Int. J. Digit. Earth.

[14]  Steffen Fritz,et al.  A global reference database of crowdsourced cropland data collected using the Geo-Wiki platform , 2017, Scientific Data.

[15]  W. Overton,et al.  Using ‘found’ data to augment a probability sample: Procedure and case study , 1993, Environmental monitoring and assessment.

[16]  Jeremy Morley,et al.  Web 2.0 geotagged photos: Assessing the spatial dimension of the phenomenon , 2010 .

[17]  Hansi Senaratne,et al.  A review of volunteered geographic information quality assessment methods , 2017, Int. J. Geogr. Inf. Sci..

[18]  Richard Valliant,et al.  Estimating Propensity Adjustments for Volunteer Web Surveys , 2011 .

[19]  Lucy Bastin,et al.  Usability of VGI for validation of land cover maps , 2015, Int. J. Geogr. Inf. Sci..

[20]  Steffen Fritz,et al.  Assessing the Accuracy of Volunteered Geographic Information arising from Multiple Contributors to an Internet Based Collaborative Project , 2013, Trans. GIS.

[21]  Giles M. Foody,et al.  The impact of imperfect ground reference data on the accuracy of land cover change estimation , 2009 .

[22]  Steffen Fritz,et al.  Investigating the Feasibility of Geo-Tagged Photographs as Sources of Land Cover Input Data , 2016, ISPRS Int. J. Geo Inf..

[23]  Sam Meek,et al.  A BPMN solution for chaining OGC services to quality assure location-based crowdsourced data , 2016, Comput. Geosci..

[24]  Stephen V. Stehman,et al.  Model-assisted estimation as a unifying framework for estimating the area of land cover and land-cover change from remote sensing , 2009 .

[25]  Lucy Bastin,et al.  Assessing VGI Data Quality , 2017 .

[26]  Vyron Antoniou,et al.  MEASURES AND INDICATORS OF VGI QUALITY: AN OVERVIEW , 2015 .

[27]  Steffen Fritz,et al.  Harnessing the power of volunteers, the internet and Google Earth to collect and validate global spatial information using Geo-Wiki , 2015 .

[28]  S. de Bruin,et al.  Assessing global land cover reference datasets for different user communities , 2015 .

[29]  Giles M. Foody,et al.  Good practices for estimating area and assessing accuracy of land change , 2014 .

[30]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[31]  Guillaume Touya,et al.  Quality Assessment of the French OpenStreetMap Dataset , 2010, Trans. GIS.

[32]  Stephen V. Stehman,et al.  Basic probability sampling designs for thematic map accuracy assessment , 1999 .

[33]  Tomas J. Bird,et al.  Statistical solutions for error and bias in global citizen science datasets , 2014 .

[34]  Steffen Fritz,et al.  A global dataset of crowdsourced land cover and land use reference data , 2016, Scientific Data.

[35]  Matthew L. Clark,et al.  Virtual Interpretation of Earth Web-Interface Tool (VIEW-IT) for Collecting Land-Use/Land-Cover Reference Data , 2011, Remote. Sens..

[36]  Steen Magnussen,et al.  Arguments for a model-dependent inference? , 2015 .

[37]  D. Brus,et al.  A Method to Combine Non-probability Sample Data with Probability Sample Data in Estimating Spatial Means of Environmental Variables , 2003, Environmental monitoring and assessment.

[38]  C. Braak,et al.  Model-free estimation from spatial samples: A reappraisal of classical sampling theory , 1990 .

[39]  C. Särndal,et al.  Calibration Estimators in Survey Sampling , 1992 .

[40]  Steffen Fritz,et al.  Geo-Wiki: An online platform for improving global land cover , 2012, Environ. Model. Softw..

[41]  P. Diggle,et al.  Geostatistical inference under preferential sampling , 2010 .

[42]  F. J. Gallego Remote sensing and land cover area estimation , 2004 .

[43]  Roger Tourangeau,et al.  Summary Report of the AAPOR Task Force on Non-probability Sampling , 2013 .

[44]  Hao Wu,et al.  Active Collection of Land Cover Sample Data from Geo-Tagged Web Texts , 2015, Remote. Sens..

[45]  Michael Edward Hohn,et al.  An Introduction to Applied Geostatistics: by Edward H. Isaaks and R. Mohan Srivastava, 1989, Oxford University Press, New York, 561 p., ISBN 0-19-505012-6, ISBN 0-19-505013-4 (paperback), $55.00 cloth, $35.00 paper (US) , 1991 .

[46]  Lucy Bastin,et al.  The Sensitivity of Mapping Methods to Reference Data Quality: Training Supervised Image Classifications with Imperfect Reference Data , 2016, ISPRS Int. J. Geo Inf..

[47]  Steffen Fritz,et al.  Crowdsourcing In-Situ Data on Land Cover and Land Use Using Gamification and Mobile Technology , 2016, Remote. Sens..

[48]  A. Comber,et al.  Accurate Attribute Mapping from Volunteered Geographic Information: Issues of Volunteer Quantity and Quality , 2015 .

[49]  Mohamed Bishr,et al.  A trust and reputation model for filtering and classifying knowledge about urban growth , 2008 .

[50]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[51]  P. Mooney,et al.  Comparison of the accuracy of OpenStreetMap for Ireland with Google Maps and Bing Maps , 2010 .

[52]  Bin Jiang,et al.  Crowdsourcing, Citizen Science or Volunteered Geographic Information? The Current State of Crowdsourced Geographic Information , 2016, ISPRS Int. J. Geo Inf..

[53]  P. Diggle,et al.  Model‐based geostatistics , 2007 .

[54]  Ronald E. McRoberts,et al.  Probability- and model-based approaches to inference for proportion forest using satellite imagery as ancillary data , 2010 .

[55]  Stephen V. Stehman,et al.  Design and Analysis for Thematic Map Accuracy Assessment: Fundamental Principles , 1998 .

[56]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[57]  Sam Meek,et al.  A flexible framework for assessing the quality of crowdsourced data , 2014 .

[58]  R. McRoberts A model-based approach to estimating forest area , 2006 .

[59]  S. Stehman Estimating area from an accuracy assessment error matrix , 2013 .

[60]  Giles M. Foody,et al.  Evaluation of SVM, RVM and SMLR for Accurate Image Classification With Limited Ground Data , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[61]  Richard Valliant,et al.  Internet Surveys: Can Statistical Adjustments Eliminate Coverage Bias? , 2008 .

[62]  Stephen V. Stehman,et al.  Practical Implications of Design-Based Sampling Inference for Thematic Map Accuracy Assessment , 2000 .

[63]  Marco Minghini,et al.  Using OpenStreetMap to Create Land Use and Land Cover Maps , 2017 .

[64]  Y. Yamagata,et al.  Validating land cover maps with Degree Confluence Project information , 2006 .

[65]  G. Loosveldt,et al.  An evaluation of the weighting procedures for an online access panel survey , 2008 .

[66]  Steffen Fritz,et al.  LACO-Wiki: A New Online Land Cover Validation Tool Demonstrated Using GlobeLand30 for Kenya , 2017, Remote. Sens..

[67]  Steffen Fritz,et al.  Development of a global hybrid forest mask through the synergy of remote sensing, crowdsourcing and FAO statistics , 2015 .

[68]  T. Gregoire Design-based and model-based inference in survey sampling: appreciating the difference , 1998 .