Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

Ensemble classifiers are being widely used for the classification of spectroscopic data. In this regard, the random forest (RF) ensemble has been successfully applied in an array of applications, and has proven to be robust in handling high dimensional data. More recently, several variants of the traditional RF algorithm including rotation forest (rotF) and oblique random forest (oRF) have been applied to classifying high dimensional data. In this study we compare the traditional RF, rotF, and oRF (using three different splitting rules, i.e., ridge regression, partial least squares, and support vector machine) for the classification of healthy and infected Pinus radiata seedlings using high dimensional spectroscopic data. We further test the robustness of these five ensemble classifiers to reduced spectral resolution by spectral resampling (binning) of the original spectral bands. The results showed that the three oblique random forest ensembles outperformed both the traditional RF and rotF ensembles. Additionally, the rotF ensemble proved to be the least robust of the five ensembles tested. Spectral resampling of the original bands provided mixed results. Nevertheless, the results demonstrate that using spectral resampled bands is a promising approach to classifying asymptomatic stress in Pinus radiata seedlings.

[1]  Giles M. Foody,et al.  Feature Selection for Classification of Hyperspectral Data by SVM , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[2]  G Bonifazi,et al.  Early detection of toxigenic fungi on maize by hyperspectral imaging analysis. , 2010, International journal of food microbiology.

[3]  Shahar Mendelson,et al.  On the Size of Convex Hulls of Small Sets , 2002, J. Mach. Learn. Res..

[4]  Peter Kokol,et al.  Effectiveness of Rotation Forest in Meta-learning Based Gene Expression Classification , 2007, Twentieth IEEE International Symposium on Computer-Based Medical Systems (CBMS'07).

[5]  Michael J. Wingfield,et al.  The pitch canker fungus, Fusarium circinatum: implications for South African forestry , 2011 .

[6]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[7]  Pablo J. Zarco-Tejada,et al.  High-resolution airborne hyperspectral and thermal imagery for early detection of Verticillium wilt of olive using fluorescence, temperature and narrow-band spectral indices , 2013 .

[8]  M. Wingfield,et al.  First outbreak of pitch canker in a South African pine plantation , 2007, Australasian Plant Pathology.

[9]  S. M. Jong,et al.  The Importance of Scale in Object-based Mapping of Vegetation Parameters with Hyperspectral Imagery , 2007 .

[10]  Pablo J. Zarco-Tejada,et al.  Early Detection and Quantification of Verticillium Wilt in Olive Using Hyperspectral and Thermal Imagery over Large Areas , 2015, Remote. Sens..

[11]  Xiang-Dong Liu,et al.  Hyperspectral detection of rice damaged by rice leaf folder (Cnaphalocrocis medinalis) , 2012 .

[12]  P. Zimba,et al.  Detecting Sugarcane yellow leaf virus infection in asymptomatic leaves with hyperspectral remote sensing and associated leaf pigment changes. , 2010, Journal of virological methods.

[13]  Gunter Menz,et al.  Requirements on spectral resolution of remote sensing data for crop stress detection , 2009, 2009 IEEE International Geoscience and Remote Sensing Symposium.

[14]  Gunter Menz,et al.  Spectral requirements on airborne hyperspectral remote sensing data for wheat disease detection , 2011, Precision Agriculture.

[15]  Paul M. Mather,et al.  Support vector machines for classification in remote sensing , 2005 .

[16]  Juan José Rodríguez Diez,et al.  Rotation-Based Ensembles , 2003, CAEPIA.

[17]  Onisimo Mutanga,et al.  Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers , 2014 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Michael J. Wingfield,et al.  Susceptibility of South African native conifers to the pitch canker pathogen, Fusarium circinatum. , 2009 .

[20]  H. Jones,et al.  Monitoring and screening plant populations with combined thermal and chlorophyll fluorescence imaging. , 2007, Journal of experimental botany.

[21]  O. Mutanga,et al.  Discriminating the papyrus vegetation (Cyperus papyrus L.) and its co-existent species using random forest and hyperspectral data resampled to HYMAP , 2012 .

[22]  Ye Zhang,et al.  Robust Hyperspectral Classification Using Relevance Vector Machine , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Mahesh Pal,et al.  Support vector machine‐based feature selection for land cover classification: a case study with DAIS hyperspectral data , 2006 .

[24]  Piotr Fryzlewicz,et al.  Random Rotation Ensembles , 2016, J. Mach. Learn. Res..

[25]  Paul M. Mather,et al.  An assessment of the effectiveness of decision tree methods for land cover classification , 2003 .

[26]  Onisimo Mutanga,et al.  Discriminating the early stages of Sirex noctilio infestation using classification tree ensembles and shortwave infrared bands , 2011 .

[27]  M. Wingfield,et al.  Pitch canker caused by Fusarium circinatum — a growing threat to pine plantations and forests worldwide , 2008, Australasian Plant Pathology.

[28]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[29]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[30]  Onisimo Mutanga,et al.  Spectral resampling based on user-defined inter-band correlation filter: C3 and C4 grass species classification , 2013, Int. J. Appl. Earth Obs. Geoinformation.

[31]  Witold R. Rudnicki,et al.  Boruta - A System for Feature Selection , 2010, Fundam. Informaticae.

[32]  Lorenzo Bruzzone,et al.  On the role of spectral resolution and classifier complexity in the analysis of hyperspectral images of forest areas , 2007, SPIE Remote Sensing.

[33]  C. Goose,et al.  Glossary of Terms , 2004, Machine Learning.

[34]  J. Flexas,et al.  Detection of bacterial wilt infection caused by Ralstonia solanacearum in potato (Solanum tuberosum L.) through multifractal analysis applied to remotely sensed data , 2012, Precision Agriculture.

[35]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[36]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[37]  Peijun Du,et al.  Hyperspectral Remote Sensing Image Classification Based on Rotation Forest , 2014, IEEE Geoscience and Remote Sensing Letters.

[38]  G. Foody Assessing the Accuracy of Remotely Sensed Data: Principles and Practices , 2010 .

[39]  John M. Kovacs,et al.  Spectral response to varying levels of leaf pigments collected from a degraded mangrove forest , 2012 .

[40]  Taskin Kavzoglu,et al.  Object-based classification with rotation forest ensemble learning algorithm using very-high-resolution WorldView-2 image , 2015 .

[41]  Riyad Ismail,et al.  Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms , 2016 .

[42]  Peijun Du,et al.  Spectral–Spatial Classification for Hyperspectral Data Using Rotation Forests With Local Feature Extraction and Markov Random Fields , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[43]  R. Ehsani,et al.  Optimum spectral and geometric parameters for early detection of laurel wilt disease in avocado , 2015 .

[44]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[45]  Thanh-Nghi Do,et al.  Classifying Very-High-Dimensional Data with Random Forests of Oblique Decision Trees , 2009, EGC.

[46]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[47]  W. Marasas,et al.  Diversity and differentiation in two populations of Gibberella circinata in South Africa , 2005 .

[48]  Won Suk Lee,et al.  Original paper: Diagnosis of bacterial spot of tomato using spectral signatures , 2010 .

[49]  Thanh-Nghi Do,et al.  Classifying many-class high-dimensional fingerprint datasets using random forest of oblique decision trees , 2015, Vietnam Journal of Computer Science.

[50]  Peijun Du,et al.  Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features , 2015 .

[51]  Riyad Ismail,et al.  Using Boruta-Selected Spectroscopic Wavebands for the Asymptomatic Detection of Fusarium Circinatum Stress , 2014, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[52]  Alexander F. H. Goetz,et al.  Three decades of hyperspectral remote sensing of the Earth: a personal view. , 2009 .

[53]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[54]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[55]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[56]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  C. Brodley,et al.  Decision tree classification of land cover from remotely sensed data , 1997 .

[58]  E. Polley,et al.  Statistical Applications in Genetics and Molecular Biology Random Forests for Genetic Association Studies , 2011 .

[59]  K. O’Donnell,et al.  New Fusarium species and combinations within the Gibberella fujikuroi species complex , 1998 .

[60]  E. Adam,et al.  Estimation of thrips (Fulmekiola serrata Kobus) density in sugarcane using leaf-level hyperspectral data , 2013 .

[61]  Nitesh K. Poona,et al.  Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data , 2016, Applied spectroscopy.

[62]  Onisimo Mutanga,et al.  A comparison of regression tree ensembles: Predicting Sirex noctilio induced water stress in Pinus patula forests of KwaZulu-Natal, South Africa , 2010, Int. J. Appl. Earth Obs. Geoinformation.