Assessing the impact of training sample selection on accuracy of an urban classification: a case study in Denver, Colorado

Understanding the factors that influence the performance of classifications over urban areas is of considerable importance to applications of remote-sensing-derived products in urban design and planning. We examined the impact of training sample selection on a binary classification of urban and nonurban for the Denver, Colorado, metropolitan area. Complete coverage reference data for urban and nonurban cover were available for the year 1997, which allowed us to examine variability in accuracy of the classification over multiple repetitions of the training sample selection and classification process. Four sampling designs for selecting training data were evaluated. These designs represented two options for stratification (spatial and class-specific) and two options for sample allocation (proportional to area and equal allocation). The binary urban and nonurban classification was obtained by employing a decision tree classifier with Landsat imagery. The decision tree classifier was applied to 1000 training samples selected by each of the four training data sampling designs, and accuracy for each classification was derived using the complete coverage reference data. The allocation of sample size to the two classes had a greater effect on classifier performance than the spatial distribution of the training data. The choice of proportional or equal allocation depends on which accuracy objectives have higher priority for a given application. For example, proportionally allocating the training sample to urban and nonurban classes favoured user’s accuracy of urban whereas equally allocating the training sample to the two classes favoured producer’s accuracy of urban. Although this study focused on urban and nonurban classes, the results and conclusions likely generalize to any binary classification in which the two classes represent disproportionate areas.

[1]  Giorgos Mountrakis,et al.  Integration of urban growth modelling products with image-based urban change analysis , 2013 .

[2]  Lena Vogler,et al.  Computer Processing Of Remotely Sensed Images An Introduction , 2016 .

[3]  Paul E. Gessler,et al.  Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and change detection in complex neotropical environments , 2008 .

[4]  R. DeFries,et al.  Classification trees: an alternative to traditional land cover classifiers , 1996 .

[5]  D. R. Cutler,et al.  Effects of sample survey design on the accuracy of classification tree models in species distribution models , 2006 .

[6]  Li Zhang,et al.  Spatiotemporal analysis of rural–urban land conversion , 2009, Int. J. Geogr. Inf. Sci..

[7]  Roger White,et al.  The Use of Constrained Cellular Automata for High-Resolution Modelling of Urban Land-Use Dynamics , 1997 .

[8]  S. Stehman,et al.  Accuracy Assessment , 2003 .

[9]  B. Datt,et al.  On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification , 2005 .

[10]  Lindi J. Quackenbush,et al.  Impact of training and validation sample selection on classification accuracy and accuracy assessment when using reference polygons in object-based classification , 2013 .

[11]  Paul M. Mather,et al.  An assessment of the effectiveness of decision tree methods for land cover classification , 2003 .

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  Giles M. Foody,et al.  Training set size requirements for the classification of a specific class , 2006 .

[14]  Xuefei Hu,et al.  Estimating impervious surfaces using linear spectral mixture analysis with multitemporal ASTER images , 2009 .

[15]  Jim Piper,et al.  Variability and bias in experimentally measured classifier error rates , 1992, Pattern Recognit. Lett..

[16]  Stephen V. Stehman,et al.  Practical Implications of Design-Based Sampling Inference for Thematic Map Accuracy Assessment , 2000 .

[17]  Giles M. Foody,et al.  Status of land cover classification accuracy assessment , 2002 .

[18]  P. C. Smits,et al.  QUALITY ASSESSMENT OF IMAGE CLASSIFICATION ALGORITHMS FOR LAND-COVER MAPPING , 1999 .

[19]  G. Mountrakis,et al.  Urban Growth Prediction: A Review of Computational Models and Human Perceptions , 2012 .

[20]  Qihao Weng,et al.  Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends , 2012 .

[21]  A. Jacquin,et al.  A hybrid object-based classification approach for mapping urban sprawl in periurban environment , 2008 .

[22]  Bryan C. Pijanowski,et al.  Calibrating a neural network‐based urban change model for two metropolitan areas of the Upper Midwest of the United States , 2005, Int. J. Geogr. Inf. Sci..

[23]  Ray Bert,et al.  Book Review: Computer Processing of Remotely-Sensed Images: An Introduction, Third Edition , by Paul M. Mather. Chichester, United Kingdom: John Wiley & Sons Ltd., 2004 , 2004 .

[24]  M. Bauer,et al.  Estimating and Mapping Impervious Surface Area by Regression Analysis of Landsat Imagery , 2007 .

[25]  Stephen V. Stehman,et al.  Sampling designs for accuracy assessment of land cover , 2009 .

[26]  Timothy A. Warner,et al.  The SAGE Handbook of Remote Sensing , 2009 .

[27]  R. Congalton,et al.  Accuracy assessment: a user's perspective , 1986 .

[28]  L.L.F. Janssen,et al.  Accuracy assessment of satellite derived land - cover data : a review , 1994 .