An assessment of the effectiveness of a random forest classifier for land-cover classification

Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.

[1]  W. Feller,et al.  An Introduction to Probability Theory and Its Application. , 1951 .

[2]  C. Woodcock,et al.  The status of agricultural lands in Egypt: The use of multitemporal NDVI features derived from landsat TM☆ , 1996 .

[3]  John Rogan,et al.  6 Integrating GIS and Remotely Sensed Data for Mapping Forest Disturbance and Change , 2006 .

[4]  Janet Franklin,et al.  Mapping land-cover modifications over large areas: A comparison of machine learning algorithms , 2008 .

[5]  Philip M. Fearnside,et al.  Global Warming and Tropical Land-Use Change: Greenhouse Gas Emissions from Biomass Burning, Decomposition and Soils in Forest Conversion, Shifting Cultivation and Secondary Vegetation , 2000 .

[6]  Juan J. Flores,et al.  The application of artificial neural networks to the analysis of remotely sensed data , 2008 .

[7]  Björn Waske,et al.  Classifier ensembles for land cover mapping using multitemporal SAR imagery , 2009 .

[8]  Rick L. Lawrence,et al.  Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (RandomForest) , 2006 .

[9]  Johannes R. Sveinsson,et al.  Random Forest classification of multisource remote sensing and geographic data , 2004, IGARSS 2004. 2004 IEEE International Geoscience and Remote Sensing Symposium.

[10]  G. Mountrakis,et al.  Developing collaborative classifiers using an expert-based model. , 2009 .

[11]  Roger M. McCoy,et al.  Field Methods in Remote Sensing , 2004 .

[12]  Dennis D. Kopp,et al.  IPM from Space: Using Satellite Imagery to Construct Regional Crop Maps for Studying Crop—Insect Interaction , 1999 .

[13]  Paul M. Mather,et al.  An assessment of the effectiveness of decision tree methods for land cover classification , 2003 .

[14]  C. Brodley,et al.  Decision tree classification of land cover from remotely sensed data , 1997 .

[15]  R. DeFries,et al.  Classification trees: an alternative to traditional land cover classifiers , 1996 .

[16]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[17]  Jonathan Cheung-Wai Chan,et al.  Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data , 2000 .

[18]  Kees Klein Goldewijk,et al.  Biogeophysical effects of land use on climate : Model simulations of radiative forcing and large-scale temperature change , 2007 .

[19]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[20]  R. Congalton,et al.  Evaluating seasonal variability as an aid to cover-type mapping from Landsat Thematic Mapper data in the Northeast , 1995 .

[21]  G. Bonan Forests and Climate Change: Forcings, Feedbacks, and the Climate Benefits of Forests , 2008, Science.

[22]  J. Peters,et al.  Random forests as a tool for ecohydrological distribution modelling , 2007 .

[23]  S. Cornell,et al.  Random Forest characterization of upland vegetation and management burning from aerial imagery , 2009 .

[24]  Peter M. Atkinson,et al.  The integration of spectral and textural information using neural networks for land cover mapping in the Mediterranean , 2000 .

[25]  J. Franklin Predicting the distribution of shrub species in southern California from climate and terrain‐derived variables , 1998 .

[26]  Marvin E. Bauer,et al.  Multi‐level Land Cover Mapping of the Twin Cities (Minnesota) Metropolitan Area with Multi‐seasonal Landsat TM/ETM+ Data , 2005 .

[27]  L. S. Davis,et al.  An assessment of support vector machines for land cover classi(cid:142) cation , 2002 .

[28]  John R. Jensen Introductory Digital Image Processing , 2004 .

[29]  José Manuel Moreira El sistema de información geográfica-ambiental de Andalucía. Del SINAMBA a la Red de Información Ambiental de Andalucía , 2006 .

[30]  Giles M. Foody,et al.  An evaluation of some factors affecting the accuracy of classification by an artificial neural network , 1997 .

[31]  R. Hall,et al.  Incorporating texture into classification of forest species composition from airborne multispectral images , 2000 .

[32]  A. Berk,et al.  Exploiting MODTRAN radiation transport for atmospheric correction: The FLAASH algorithm , 2002, Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002. (IEEE Cat.No.02EX5997).

[33]  Sassan Saatchi,et al.  The use of decision tree and multiscale texture for classification of JERS-1 SAR data over tropical forest , 2000, IEEE Trans. Geosci. Remote. Sens..

[34]  J. Peñas,et al.  Phytogeographical relationships among high mountain areas in the Baetic Ranges (South Spain) , 2002 .

[35]  Jennifer A. Miller,et al.  Land-Cover Change Monitoring with Classification Trees Using Landsat TM and Ancillary Data , 2003 .

[36]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[37]  W. Cohen,et al.  Land cover mapping in an agricultural setting using multiseasonal Thematic Mapper data , 2001 .

[38]  B. Koch,et al.  Non-parametric prediction and mapping of standing timber volume and biomass in a temperate forest: application of multiple optical/LiDAR-derived predictors , 2010 .

[39]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[40]  Jennifer A. Miller,et al.  Contextual land-cover classification: incorporating spatial dependence in land-cover classification models using random forests and the Getis statistic , 2010 .

[41]  Giles M. Foody,et al.  Land Cover Classification by an Artificial Neural Network with Ancillary Information , 1995, Int. J. Geogr. Inf. Sci..

[42]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[45]  B. Brisco,et al.  Multidate SAR/TM synergism for crop classification in western Canada , 1995 .

[46]  Alan H. Strahler,et al.  Maximizing land cover classification accuracies produced by decision trees at continental to global scales , 1999, IEEE Trans. Geosci. Remote. Sens..

[47]  Giles M. Foody,et al.  Sample size determination for image classification accuracy assessment and comparison , 2009 .

[48]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[49]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[50]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[51]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[53]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[54]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[55]  A. Hudak,et al.  Mapping snags and understory shrubs for a LiDAR-based assessment of wildlife habitat suitability , 2009 .

[56]  Jonathan Cheung-Wai Chan,et al.  Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery , 2008 .

[57]  K. Caldeira,et al.  Combined climate and carbon-cycle effects of large-scale deforestation , 2006, Proceedings of the National Academy of Sciences.

[58]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[59]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[60]  Michael A. Wulder,et al.  Remote sensing methods in medium spatial resolution satellite data land cover classification of large areas , 2002 .

[61]  John Mingers,et al.  An Empirical Comparison of Pruning Methods for Decision Tree Induction , 1989, Machine Learning.

[62]  Johannes R. Sveinsson,et al.  Multiple classifiers applied to multisource remote sensing data , 2002, IEEE Trans. Geosci. Remote. Sens..

[63]  P. Atkinson,et al.  Introduction Neural networks in remote sensing , 1997 .

[64]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[65]  J. Mas,et al.  Assessing land use/cover changes: a nationwide multidate spatial database for Mexico , 2004 .

[66]  C. Lippitt,et al.  Mapping Selective Logging in Mixed Deciduous Forest: A Comparison of Machine Learning Algorithms , 2008 .

[67]  Brian M. Steele,et al.  Combining Multiple Classifiers: An Application Using Spatial and Remotely Sensed Information for Land Cover Type Mapping , 2000 .

[68]  Jungho Im,et al.  Support vector machines in remote sensing: A review , 2011 .

[69]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[70]  Ian Witten,et al.  Data Mining , 2000 .

[71]  Panayiotis E. Pintelas,et al.  Combining Bagging and Boosting , 2007 .

[72]  Ross S. Lunetta,et al.  Application of multi-temporal Landsat 5 TM imagery for wetland identification , 1999 .

[73]  Paul E. Gessler,et al.  Integrating Landsat TM and SRTM-DEM derived variables with decision trees for habitat classification and change detection in complex neotropical environments , 2008 .

[74]  J. W. Bruce,et al.  The causes of land-use and land-cover change: moving beyond the myths , 2001 .

[75]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[76]  Jan Dempewolf,et al.  Mapping regional land cover with MODIS data for biological conservation: Examples from the Greater Yellowstone Ecosystem, USA and Pará State, Brazil , 2004 .

[77]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[78]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[79]  E. Lambin,et al.  The emergence of land change science for global environmental change and sustainability , 2007, Proceedings of the National Academy of Sciences.

[80]  Curtis E. Woodcock,et al.  Monitoring agricultural lands in Egypt with multitemporal Landsat TM imagery: How many images are needed? , 1997 .

[81]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[82]  Peter T. Wolter,et al.  Improved forest classification in the northern Lake States using multi-temporal Landsat imagery , 1995 .

[83]  Jennifer A. Miller,et al.  Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence , 2002 .

[84]  Stephen Sitch,et al.  Role of land cover changes for atmospheric CO2 increase and climate change during the last 150 years , 2004 .