Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery

Although a large number of new image classification algorithms have been developed, they are rarely tested with the same classification task. In this research, with the same Landsat Thematic Mapper (TM) data set and the same classification scheme over Guangzhou City, China, we tested two unsupervised and 13 supervised classification algorithms, including a number of machine learning algorithms that became popular in remote sensing during the past 20 years. Our analysis focused primarily on the spectral information provided by the TM data. We assessed all algorithms in a per-pixel classification decision experiment and all supervised algorithms in a segment-based experiment. We found that when sufficiently representative training samples were used, most algorithms performed reasonably well. Lack of training samples led to greater classification accuracy discrepancies than classification algorithms themselves. Some algorithms were more tolerable to insufficient (less representative) training samples than others. Many algorithms improved the overall accuracy marginally with per-segment decision making.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Peng Gong,et al.  A comparison of spatial feature extraction algorithms for land-use classification with SPOT HRV data , 1992 .

[6]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[7]  Greg Ridgeway,et al.  Generalized Boosted Models: A guide to the gbm package , 2006 .

[8]  Nello Cristianini,et al.  Query Learning with Large Margin Classifiers , 2000, ICML.

[9]  B. Datt,et al.  On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification , 2005 .

[10]  Peng Gong,et al.  Clustering based on eigenspace transformation – CBEST for efficient classification , 2013 .

[11]  P. Gong,et al.  Accuracy Assessment Measures for Object-based Image Segmentation Goodness , 2010 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[14]  B. Xu,et al.  Comparison of gray-level reduction and different texture spectrum encoding methods for land-use classification using a panchromatic Ikonos image , 2003 .

[15]  Ruiliang Pu,et al.  Object-based urban detailed land cover classification with high spatial resolution IKONOS imagery , 2011 .

[16]  Hankui K. Zhang,et al.  Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data , 2013 .

[17]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[18]  Qihao Weng,et al.  Remote sensing of impervious surfaces in the urban areas: Requirements, methods, and trends , 2012 .

[19]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Nils Wolf,et al.  Object Features for Pixel-based Classi cation of Urban Areas Comparing Different Machine Learning Algorithms Objektmerkmale für die pixelbasierte Klassifizierung urbaner Räume: ein Vergleich von Algorithmen des maschinellen Lernens , 2013 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[25]  Mikhail F. Kanevski,et al.  A Survey of Active Learning Algorithms for Supervised Remote Sensing Image Classification , 2011, IEEE Journal of Selected Topics in Signal Processing.

[26]  Jim Piper,et al.  Variability and bias in experimentally measured classifier error rates , 1992, Pattern Recognit. Lett..

[27]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[28]  Curt H. Davis,et al.  A hierarchical fuzzy classification approach for high-resolution multispectral data over urban areas , 2003, IEEE Trans. Geosci. Remote. Sens..

[29]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[30]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[31]  Eibe Frank,et al.  Speeding Up Logistic Model Tree Induction , 2005, PKDD.

[32]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[33]  Maggi Kelly,et al.  A spatial–temporal approach to monitoring forest disease spread using multi-temporal high spatial resolution imagery , 2006 .

[34]  Desheng Liu,et al.  A comparison of object-based and contextual pixel-based classifications using high and medium spatial resolution images , 2013 .

[35]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[36]  Qihao Weng,et al.  Land Use and Land Cover Change in Guangzhou, China, from 1998 to 2003, Based on Landsat TM /ETM+ Imagery , 2007, Sensors (Basel, Switzerland).

[37]  Graeme G. Wilkinson,et al.  Results and implications of a study of fifteen years of satellite image classification experiments , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[38]  Yunpeng Wang,et al.  Evaluating the Temporal and Spatial Urban Expansion Patterns of Guangzhou from 1979 to 2003 by Remote Sensing and GIS Methods , 2009, Int. J. Geogr. Inf. Sci..

[39]  Giorgos Mountrakis,et al.  Converting local spectral and spatial information from a priori classifiers into contextual knowledge for impervious surface classification , 2011 .

[40]  Kun Shan Chen,et al.  An Adaptive Thresholding Multiple Classifiers System for Remote Sensing Image Classification , 2009 .

[41]  Peng Gong,et al.  An assessment of some factors influencing multispectral land-cover classification , 1990 .

[42]  C. Woodcock,et al.  Classification and Change Detection Using Landsat TM Data: When and How to Correct Atmospheric Effects? , 2001 .

[43]  P. Gong,et al.  The use of structural information for improving land-cover classification accuracies at the rural-urban fringe. , 1990 .

[44]  C. Woodcock,et al.  Monitoring land-use change in the Pearl River Delta using Landsat TM , 2002 .

[45]  Ian Witten,et al.  Data Mining , 2000 .

[46]  Peng Gong,et al.  Integration of object-based and pixel-based classification for mapping mangroves with IKONOS imagery , 2004 .

[47]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[48]  Paolo Gamba,et al.  Human Settlements: A Global Challenge for EO Data Processing and Interpretation , 2013, Proceedings of the IEEE.

[49]  J. Friedman Stochastic gradient boosting , 2002 .

[50]  A. Jawad,et al.  Computer processing of remotely sensed images , 2005 .

[51]  Giorgos Mountrakis,et al.  Assessing reference dataset representativeness through confidence metrics based on information density , 2013 .

[52]  Philip J. Howarth,et al.  Land-use classification of SPOT HRV data using a cover-frequency method , 1992 .

[53]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[54]  P. Gong,et al.  Frequency-based contextual classification and gray-level vector reduction for land-use identification , 1992 .

[55]  Giles M. Foody,et al.  Training set size requirements for the classification of a specific class , 2006 .