Random forest in remote sensing: A review of applications and future directions

Abstract A random forest (RF) classifier is an ensemble classifier that produces multiple decision trees, using a randomly selected subset of training samples and variables. This classifier has become popular within the remote sensing community due to the accuracy of its classifications. The overall objective of this work was to review the utilization of RF classifier in remote sensing. This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting. It is, however, sensitive to the sampling design. The variable importance (VI) measurement provided by the RF classifier has been extensively exploited in different scenarios, for example to reduce the number of dimensions of hyperspectral data, to identify the most relevant multisource remote sensing and geographic data, and to select the most suitable season to classify particular target classes. Further investigations are required into less commonly exploited uses of this classifier, such as for sample proximity analysis to detect and remove outliers in the training samples.

[1]  Hong Wang,et al.  Mapping Robinia Pseudoacacia Forest Health Conditions by Using Combined Spectral, Spatial, and Textural Information Extracted from IKONOS Imagery and Random Forest Classifier , 2015, Remote. Sens..

[2]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Narumasa Tsutsumida,et al.  Measures of spatio-temporal accuracy for time series land cover data , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[4]  Fei Deng,et al.  Integration of orthoimagery and lidar data for object-based urban thematic mapping using random forests , 2013 .

[5]  Jörg Müller,et al.  Modelling Forest α-Diversity and Floristic Composition - On the Added Value of LiDAR plus Hyperspectral Remote Sensing , 2012, Remote. Sens..

[6]  René Roland Colditz,et al.  An Evaluation of Different Training Sample Allocation Schemes for Discrete and Continuous Land Cover Classification Using Decision Tree-Based Algorithms , 2015, Remote. Sens..

[7]  Gang Chen,et al.  Identification of Forested Landslides Using LiDar Data, Object-based Image Analysis, and Machine Learning Algorithms , 2015, Remote. Sens..

[8]  Changshan Wu,et al.  The use of single-date MODIS imagery for estimating large-scale urban impervious surface fraction with spectral mixture analysis and machine learning techniques , 2013 .

[9]  J. Niemeyer,et al.  Contextual classification of lidar data and building object detection in urban areas , 2014 .

[10]  Jungho Im,et al.  Detection of Convective Initiation Using Meteorological Imager Onboard Communication, Ocean, and Meteorological Satellite Based on Machine Learning Approaches , 2015, Remote. Sens..

[11]  Peijun Du,et al.  Hyperspectral Remote Sensing Image Classification Based on Rotation Forest , 2014, IEEE Geoscience and Remote Sensing Letters.

[12]  Thomas Blaschke,et al.  Ontology-Based Classification of Building Types Detected from Airborne Laser Scanning Data , 2014, Remote. Sens..

[13]  Samia Boukir,et al.  Relevance of airborne lidar and multispectral image data for urban scene classification using Random Forests , 2011 .

[14]  Brian W. Barrett,et al.  Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[15]  M. Cho,et al.  Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment , 2012 .

[16]  Songfeng Zheng,et al.  Applying tree-based ensemble algorithms to the classification of ecological zones using multi-temporal multi-source remote-sensing data , 2012 .

[17]  Markku Kuitunen,et al.  What makes segmentation good? A case study in boreal forest habitat mapping , 2013 .

[18]  Elizabeth M. Middleton,et al.  Selection of Hyperspectral Narrowbands (HNBs) and Composition of Hyperspectral Twoband Vegetation Indices (HVIs) for Biophysical Characterization and Discrimination of Crop Types Using Field Reflectance and Hyperion/EO-1 Data , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[19]  Björn Waske,et al.  Optimization of Object-Based Image Analysis With Random Forests for Land Cover Mapping , 2013, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[20]  Ponnuthurai N. Suganthan,et al.  Random Forests with ensemble of feature spaces , 2014, Pattern Recognit..

[21]  Jungho Im,et al.  Support vector machines in remote sensing: A review , 2011 .

[22]  Jon Atli Benediktsson,et al.  A Novel Technique for Optimal Feature Selection in Attribute Profiles Based on Genetic Algorithms , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Xiang Chen,et al.  Willows: a memory efficient tree and forest construction package , 2009, BMC Bioinformatics.

[24]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[25]  Onisimo Mutanga,et al.  Random Forests Unsupervised Classification: The Detection and Mapping of Solanum mauritianum Infestations in Plantation Forestry Using Hyperspectral Data , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[26]  Jon Atli Benediktsson,et al.  Extended Self-Dual Attribute Profiles for the Classification of Hyperspectral Images , 2015, IEEE Geoscience and Remote Sensing Letters.

[27]  Onisimo Mutanga,et al.  Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers , 2014 .

[28]  Patrick Hostert,et al.  imageRF - A user-oriented implementation for remote sensing image analysis with Random Forests , 2012, Environ. Model. Softw..

[29]  Koreen Millard,et al.  On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping , 2015, Remote. Sens..

[30]  Karsten Schulz,et al.  The Improvement of Land Cover Classification by Thermal Remote Sensing , 2015, Remote. Sens..

[31]  Jan Haas,et al.  Urban growth and environmental impacts in Jing-Jin-Ji, the Yangtze, River Delta and the Pearl River Delta , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[32]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[33]  Caiyun Zhang Combining Hyperspectral and Lidar Data for Vegetation Mapping in the Florida Everglades , 2014 .

[34]  Peijun Du,et al.  Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features , 2015 .

[35]  Mariana Belgiu,et al.  Comparing supervised and unsupervised multiresolution segmentation approaches for extracting buildings from very high resolution imagery , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[36]  O. Mutanga,et al.  Discriminating indicator grass species for rangeland degradation assessment using hyperspectral data resampled to AISA Eagle resolution , 2012 .

[37]  Henning Buddenbaum,et al.  Comparison of Feature Reduction Algorithms for Classifying Tree Species With Hyperspectral Data on Three Central European Test Sites , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[38]  Juan J. Flores,et al.  The application of artificial neural networks to the analysis of remotely sensed data , 2008 .

[39]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[40]  Michael A. Wulder,et al.  Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm , 2015 .

[41]  Björn Waske,et al.  Classifier ensembles for land cover mapping using multitemporal SAR imagery , 2009 .

[42]  Bardan Ghimire,et al.  An Evaluation of Bagging, Boosting, and Random Forests for Land-Cover Classification in Cape Cod, Massachusetts, USA , 2012 .

[43]  Uwe Soergel,et al.  Contextual Classification of Full Waveform Lidar Data in the Wadden Sea , 2014, IEEE Geoscience and Remote Sensing Letters.

[44]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[45]  Serkan Kiranyaz,et al.  Integrating Color Features in Polarimetric SAR Image Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[46]  G. Groom,et al.  Spatial application of Random Forest models for fine-scale coastal vegetation classification using object based analysis of aerial orthophoto and DEM data , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[47]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[48]  Aniruddha Ghosh,et al.  A comparison of selected classification algorithms for mapping bamboo patches in lower Gangetic plains using very high resolution WorldView 2 imagery , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[49]  Leila Maria Garcia Fonseca,et al.  GeoDMA - Geographic Data Mining Analyst , 2013, Comput. Geosci..

[50]  Joseph F. Knight,et al.  Influence of Multi-Source and Multi-Temporal Remotely Sensed and Ancillary Data on the Accuracy of Random Forest Classification of Wetlands in Northern Minnesota , 2013, Remote. Sens..

[51]  Josef Kittler,et al.  Improving Stability of Feature Selection Methods , 2007, CAIP.

[52]  Onisimo Mutanga,et al.  Spectral Discrimination of Insect Defoliation Levels in Mopane Woodland Using Hyperspectral Data , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[53]  Linlin Xu,et al.  A comparative study of different classification techniques for marine oil spill identification using RADARSAT-1 imagery , 2014 .

[54]  Samia Boukir,et al.  Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin , 2015 .

[55]  P. Gong,et al.  Efficient corn and soybean mapping with temporal extendability: A multi-year experiment using Landsat imagery , 2014 .

[56]  O. Mutanga,et al.  Evaluating the impact of red-edge band from Rapideye image for classifying insect defoliation levels , 2014 .

[57]  Stefan Hinz,et al.  Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers , 2015 .

[58]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[59]  Lei Ma,et al.  Training set size, scale, and features in Geographic Object-Based Image Analysis of very high resolution unmanned aerial vehicle imagery , 2015 .

[60]  Ryan J. Frazier,et al.  Characterization of aboveground biomass in an unmanaged boreal forest using Landsat temporal segmentation metrics , 2014 .

[61]  Michele Dalponte,et al.  Tree Species Classification in Boreal Forests With Hyperspectral Data , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[62]  Niti B. Mishra,et al.  Mapping vegetation morphology types in a dry savanna ecosystem: integrating hierarchical object-based image analysis with Random Forest , 2014 .

[63]  Giorgos Mountrakis,et al.  Assessing the impact of training sample selection on accuracy of an urban classification: a case study in Denver, Colorado , 2014 .

[64]  Weitao Chen,et al.  Forested landslide detection using LiDAR data and the random forest algorithm: A case study of the Three Gorges, China , 2014 .

[65]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[66]  André Stumpf,et al.  Object-oriented mapping of landslides using Random Forests , 2011 .

[67]  Rick L. Lawrence,et al.  Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (RandomForest) , 2006 .

[68]  Kandarpa Kumar Sarma,et al.  Hyperspectral Remote Sensing Classifications: A Perspective Survey , 2016, Trans. GIS.

[69]  Mariana Belgiu,et al.  Quantitative evaluation of variations in rule-based classifications of land cover in urban neighbourhoods using WorldView-2 imagery , 2014, ISPRS journal of photogrammetry and remote sensing : official publication of the International Society for Photogrammetry and Remote Sensing.

[70]  Yang Shao,et al.  An analysis of cropland mask choice and ancillary data for annual corn yield forecasting using MODIS data , 2015, Int. J. Appl. Earth Obs. Geoinformation.

[71]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[72]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[73]  Heather Reese,et al.  Mapping Tree Canopy Cover and Aboveground Biomass in Sudano-Sahelian Woodlands Using Landsat 8 and Random Forest , 2015, Remote. Sens..

[74]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[75]  Jonathan Cheung-Wai Chan,et al.  Evaluation of random forest and adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery , 2008 .

[76]  C. Woodcock,et al.  The use of variograms in remote sensing. I - Scene models and simulated images. II - Real digital images , 1988 .

[77]  Laurie A. Chisholm,et al.  Classification of Australian Native Forest Species Using Hyperspectral Remote Sensing and Machine-Learning Classification Algorithms , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[78]  Johannes R. Sveinsson,et al.  Multiple classifiers applied to multisource remote sensing data , 2002, IEEE Trans. Geosci. Remote. Sens..

[79]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[80]  C. Mallet,et al.  AIRBORNE LIDAR FEATURE SELECTION FOR URBAN CLASSIFICATION USING RANDOM FORESTS , 2009 .

[81]  Fan Zhang,et al.  Classification of airborne laser scanning data using JointBoost , 2015 .

[82]  Benjamin Bechtel,et al.  Classification of Local Climate Zones Based on Multiple Earth Observation Data , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[83]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[84]  Aniruddha Ghosh,et al.  A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[85]  Wenzhong Shi,et al.  A fuzzy topology-based maximum likelihood classification , 2011 .

[86]  P. K. Sinha,et al.  Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[87]  Jonathan Cheung-Wai Chan,et al.  Multiple Criteria for Evaluating Machine Learning Algorithms for Land Cover Classification from Satellite Data , 2000 .

[88]  Jonathan Cheung-Wai Chan,et al.  An evaluation of ensemble classifiers for mapping Natura 2000 heathland in Belgium using spaceborne angular hyperspectral (CHRIS/Proba) imagery , 2012, Int. J. Appl. Earth Obs. Geoinformation.

[89]  Peijun Du,et al.  Improving Random Forest With Ensemble of Features and Semisupervised Feature Extraction , 2015, IEEE Geoscience and Remote Sensing Letters.

[90]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[91]  A. Vetrivel,et al.  Identification of damage in buildings based on gaps in 3D point clouds from very high resolution oblique airborne images , 2015 .

[92]  Claude Cariou,et al.  Assessing the performance of two unsupervised dimensionality reduction techniques on hyperspectral APEX data for high resolution urban land-cover mapping , 2014 .

[93]  Konstantinos Topouzelis,et al.  Oil spill feature selection and classification using decision tree forest on SAR image data , 2012 .

[94]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[95]  P. Gessler,et al.  The multispectral separability of Costa Rican rainforest types with support vector machines and Random Forest decision trees , 2010 .

[96]  Heather Reese,et al.  Combining airborne laser scanning data and optical satellite data for classification of alpine vegetation , 2014, Int. J. Appl. Earth Obs. Geoinformation.

[97]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[98]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[99]  Jos Boekhorst,et al.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? , 2012, Briefings Bioinform..

[100]  Danielle J. Marceau,et al.  Remote sensing and the measurement of geographical entities in a forested environment. 1. The scale and spatial aggregation problem , 1994 .