Maximizing the Diversity of Ensemble Random Forests for Tree Genera Classification Using High Density LiDAR Data

Recent research into improving the effectiveness of forest inventory management using airborne LiDAR data has focused on developing advanced theories in data analytics. Furthermore, supervised learning as a predictive model for classifying tree genera (and species, where possible) has been gaining popularity in order to minimize this labor-intensive task. However, bottlenecks remain that hinder the immediate adoption of supervised learning methods. With supervised classification, training samples are required for learning the parameters that govern the performance of a classifier, yet the selection of training data is often subjective and the quality of such samples is critically important. For LiDAR scanning in forest environments, the quantification of data quality is somewhat abstract, normally referring to some metric related to the completeness of individual tree crowns; however, this is not an issue that has received much attention in the literature. Intuitively the choice of training samples having varying quality will affect classification accuracy. In this paper a Diversity Index (DI) is proposed that characterizes the diversity of data quality (Qi) among selected training samples required for constructing a classification model of tree genera. The training sample is diversified in terms of data quality as opposed to the number of samples per class. The diversified training sample allows the classifier to better learn the positive and negative instances and; therefore; has a higher classification accuracy in discriminating the “unknown” class samples from the “known” samples. Our algorithm is implemented within the Random Forests base classifiers with six derived geometric features from LiDAR data. The training sample contains three tree genera (pine; poplar; and maple) and the validation samples contains four labels (pine; poplar; maple; and “unknown”). Classification accuracy improved from 72.8%; when training samples were selected randomly (with stratified sample size); to 93.8%; when samples were selected with additional criteria; and from 88.4% to 93.8% when an ensemble method was used.

[1]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[2]  Robert Sabourin,et al.  Recognition and verification of unconstrained handwritten words , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[4]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[5]  Rosa Maria Valdovinos,et al.  The Imbalanced Training Sample Problem: Under or over Sampling? , 2004, SSPR/SPR.

[6]  Lorenzo Bruzzone,et al.  A Support Vector Domain Description Approach to Supervised Classification of Remote Sensing Images , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Sattar Hashemi,et al.  Adapted One-versus-All Decision Trees for Data Stream Classification , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Gabriele Moser,et al.  Partially Supervised classification of remote sensing images through SVM-based probability density estimation , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[10]  Kagan Tumer,et al.  Classifier ensembles: Select real-world applications , 2008, Inf. Fusion.

[11]  W. Stuetzle,et al.  Capturing tree crown formation through implicit surface reconstruction using airborne lidar data , 2009 .

[12]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[13]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[14]  E. Næsset,et al.  Classifying species of individual trees by intensity and structure features derived from airborne laser scanner data , 2009 .

[15]  Jagath C. Rajapakse,et al.  One-Versus-One and One-Versus-All Multiclass SVM-RFE for Gene Selection in Cancer Classification , 2007, EvoBIO.

[16]  D. Ruta,et al.  An Overview of Classifier Fusion Methods , 2000 .

[17]  Michael J. Pazzani,et al.  Error reduction through learning multiple descriptions , 2004, Machine Learning.

[18]  Mohamed Medhat Gaber,et al.  Diversified Random Forests Using Random Subspaces , 2014, IDEAL.

[19]  Yi Liu,et al.  One-against-all multi-class SVM classification using reliability measures , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[20]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[21]  E. Næsset,et al.  UTILIZING AIRBORNE LASER INTENSITY FOR TREE SPECIES CLASSIFICATION , 2007 .

[22]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[23]  Connie Ko,et al.  Tree genera classification with geometric features from high-density airborne LiDAR , 2013 .

[24]  Marc Pollefeys,et al.  Adaptive random forest — How many “experts” to ask before making a decision? , 2011, CVPR 2011.

[25]  Tomas Brandtberg Classifying individual tree species under leaf-off and leaf-on conditions using airborne lidar , 2007 .

[26]  Robert Sabourin,et al.  “One Against One” or “One Against All”: Which One is Better for Handwriting Recognition with SVMs? , 2006 .

[27]  H. Andersen,et al.  Tree species differentiation using intensity data derived from leaf-on and leaf-off airborne laser scanner data , 2009 .

[28]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[29]  L. Breiman Arcing Classifiers , 1998 .

[30]  M. Maltamo,et al.  Imputation of single-tree attributes using airborne laser scanning-based height, intensity, and alpha shape metrics , 2010 .

[31]  K. Moffett,et al.  Remote Sens , 2015 .

[32]  J. Hyyppä,et al.  Tree species classification using airborne LiDAR - effects of stand and tree parameters, downsizing of training set, intensity normalization, and sensor type , 2010 .

[33]  Erik Næsset,et al.  Introduction to Forestry Applications of Airborne Laser Scanning , 2014 .

[34]  Francis K. H. Quek,et al.  Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets , 2003, Pattern Recognit..

[35]  David P. Helmbold,et al.  Aerial LiDAR Data Classification Using Support Vector Machines (SVM) , 2006, Third International Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT'06).

[36]  Yaxin Bi,et al.  On combining classifier mass functions for text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[37]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[38]  Petteri Packalen,et al.  Identification of Scandinavian Commercial Species of Individual Trees from Airborne Laser Scanning Data Using Alpha Shape Metrics , 2009 .

[39]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[40]  Taskin Kavzoglu,et al.  An assessment of the effectiveness of a rotation forest ensemble for land-use and land-cover mapping , 2013 .

[41]  M. Maltamo,et al.  Effects of pulse density on predicting characteristics of individual trees of Scandinavian commercial species using alpha shape metrics based on airborne laser scanning data , 2008 .

[42]  Åsa Persson,et al.  Identifying species of individual trees using airborne laser scanner , 2004 .

[43]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[44]  Thomas M. Hinckley,et al.  Classifying individual tree genera using stepwise cluster analysis based on height and intensity metrics derived from airborne laser scanner data , 2011 .

[45]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[46]  Behnaz Bigdeli,et al.  A Multiple Classifier System for Classification of LIDAR Remote Sensing Data Using Multi-class SVM , 2010, MCS.

[47]  L. Breiman Arcing classifier (with discussion and a rejoinder by the author) , 1998 .

[48]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[49]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[50]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[51]  Åsa Persson,et al.  Species identification of individual trees by combining high resolution LiDAR data with multi‐spectral images , 2008 .

[52]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[53]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .