Multinomial logistic regression-based feature selection for hyperspectral data

Abstract This paper evaluates the performance of three feature selection methods based on multinomial logistic regression, and compares the performance of the best multinomial logistic regression-based feature selection approach with the support vector machine based recurring feature elimination approach. Two hyperspectral datasets, one consisting of 65 features (DAIS data) and other with 185 features (AVIRIS data) were used. Result suggests that a total of between 15 and 10 features selected by using the multinomial logistic regression-based feature selection approach as proposed by Cawley and Talbot achieve a significant improvement in classification accuracy in comparison to the use of all the features of the DAIS and AVIRIS datasets. In addition to the improved performance, the Cawley and Talbot approach does not require any user-defined parameter, thus avoiding the requirement of a model selection stage. In comparison, the other two multinomial logistic regression-based feature selection approaches require one user-defined parameter and do not perform as well as the Cawley and Talbot approach in terms of (i) the number of features required to achieve classification accuracy comparable to that achieved using the full dataset, and (ii) the classification accuracy achieved by the selected features. The Cawley and Talbot approach was also found to be computationally more efficient than the SVM-RFE technique, though both use the same number of selected features to achieve an equal or even higher level of accuracy than that achieved with full hyperspectral datasets.

[1]  José M. Bioucas-Dias,et al.  Fast Sparse Multinomial Regression Applied to Hyperspectral Data , 2006, ICIAR.

[2]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[3]  Lorenzo Bruzzone,et al.  A semilabeled-sample-driven bagging technique for ill-posed classification problems , 2005, IEEE Geoscience and Remote Sensing Letters.

[4]  B. Schölkopf,et al.  Sparse Multinomial Logistic Regression via Bayesian L1 Regularisation , 2007 .

[5]  Thomas M. Cover,et al.  The Best Two Independent Measurements Are Not the Two Best , 1974, IEEE Trans. Syst. Man Cybern..

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  Chris H. Q. Ding,et al.  Evolving Feature Selection , 2005, IEEE Intell. Syst..

[8]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[9]  Mahesh Pal,et al.  Support vector machine‐based feature selection for land cover classification: a case study with DAIS hyperspectral data , 2006 .

[10]  Peng Zhang,et al.  Dynamic Learning of SMLR for Feature Selection and Classification of Hyperspectral Data , 2008, IEEE Geoscience and Remote Sensing Letters.

[11]  Paul M. Mather,et al.  Support vector machines for classification in remote sensing , 2005 .

[12]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[13]  Paul M. Mather,et al.  Some issues in the classification of DAIS hyperspectral data , 2006 .

[14]  Pramod K. Varshney,et al.  Logistic Regression for Feature Selection and Soft Classification of Remote Sensing Data , 2006, IEEE Geoscience and Remote Sensing Letters.

[15]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[16]  David A. Landgrebe,et al.  Signal Theory Methods in Multispectral Remote Sensing , 2003 .

[17]  Mahesh Pal,et al.  Margin-based feature selection for hyperspectral data , 2009, Int. J. Appl. Earth Obs. Geoinformation.

[18]  David A. Landgrebe,et al.  Covariance estimation with limited training samples , 1999, IEEE Trans. Geosci. Remote. Sens..

[19]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[20]  M. Pal Factors influencing the accuracy of remote sensing classifications : a comparative study , 2002 .

[21]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[23]  Johannes R. Sveinsson,et al.  Feature extraction for multisource data classification with artificial neural networks , 1997 .

[24]  I. S. Gradshteyn,et al.  Table of Integrals, Series, and Products , 1976 .

[25]  Peter Strobl,et al.  Preprocessing for the digital airborne imaging spectrometer DAIS 7915 , 1996, Defense + Commercial Sensing.

[26]  Padraig Cunningham,et al.  Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets , 2004, SGAI Conf..

[27]  Lorenzo Bruzzone,et al.  A new search algorithm for feature selection in hyperspectral remote sensing images , 2001, IEEE Trans. Geosci. Remote. Sens..

[28]  Giles M. Foody,et al.  Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification , 2004 .

[29]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[30]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[32]  Joydeep Ghosh,et al.  Adaptive feature selection for hyperspectral data analysis using a binary hierarchical classifier and tabu search , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[33]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[34]  Giles M. Foody,et al.  Feature Selection for Classification of Hyperspectral Data by SVM , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36]  Paul M. Mather,et al.  Assessment of the effectiveness of support vector machines for hyperspectral data , 2004, Future Gener. Comput. Syst..

[37]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[38]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[39]  Mahesh Pal,et al.  Multiclass Approaches for Support Vector Machine Based Land Cover Classification , 2008, ArXiv.

[40]  Chein-I. Chang Hyperspectral Data Exploitation: Theory and Applications , 2007 .

[41]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[42]  Federico Girosi,et al.  Support Vector Machines: Training and Applications , 1997 .

[43]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[44]  G. F. Hughes,et al.  On the mean accuracy of statistical pattern recognizers , 1968, IEEE Trans. Inf. Theory.

[45]  Paul M. Mather,et al.  Computer Processing of Remotely-Sensed Images: An Introduction , 1988 .

[46]  Paul M. Mather,et al.  The role of feature selection in artificial neural network applications , 2002 .

[47]  P. Groves,et al.  Methodology For Hyperspectral Band Selection , 2004 .

[48]  G. Foody Thematic map comparison: Evaluating the statistical significance of differences in classification accuracy , 2004 .

[49]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.