Stability analysis of hyperspectral band selection algorithms based on neighborhood rough set theory for classification

Abstract Band selection is a well-known approach for reducing the dimensionality of hyperspectral data. When the neighborhood rough set theory is used to select informative bands, different criteria of the band selection algorithms may lead to different optimal band subsets. Many studies have been analyzed the classification performance of band selection algorithms and have demonstrated that different algorithms are similar for classification. Therefore, rather than evaluating band selection algorithms using only classification accuracy, their stability should also be explored. The stability of an algorithm, which is quantified by the sensitivity of the algorithm to variations in the training set, is a topic of recent interest. Most studies on stability compare the band subsets chosen either from perturbation datasets by randomly removing methods or from perturbation datasets by cross validation methods. These methods either result in an unknown degree of overlap between perturbation datasets, or an invariable degree of overlap. In this work, we propose an adjustable degree of overlap method to construct perturbation datasets, which can set different levels of perturbation. By introducing the Jaccard index as a metric of stability, we explore the stability of six band selection algorithms based on the neighborhood rough set theory. We experimentally demonstrate that the level of perturbation, the degree of overlap, the size of the subset, and the size of the neighborhood affect stability. The results show that the maximal relevance minimal redundancy difference band selection algorithm has the greatest stability overall and better classification ability.

[1]  Zou Xiaobo,et al.  Independent component analysis in information extraction from visible/near-infrared hyperspectral imaging data of cucumber leaves , 2010 .

[2]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[3]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[4]  Yao Liu,et al.  Hyperspectral band selection based on a variable precision neighborhood rough set. , 2016, Applied optics.

[5]  Zhen Xu,et al.  Maximum relevance, minimum redundancy band selection based on neighborhood rough set for hyperspectral data classification , 2016 .

[6]  Meenu Rani,et al.  An Efficient Hybrid Classification Approach for Land Use/Land Cover Analysis in a Semi-Desert Area Using ${\rm ETM}{+}$ and LISS-III Sensor , 2013, IEEE Sensors Journal.

[7]  Hongbin Pu,et al.  Application of Wavelet Analysis to Spectral Data for Categorization of Lamb Muscles , 2014, Food and Bioprocess Technology.

[8]  Yong He,et al.  Detection of early blight and late blight diseases on tomato leaves using hyperspectral imaging , 2015, Scientific Reports.

[9]  Salem Alelyani,et al.  On Feature Selection Stability: A Data Perspective , 2013 .

[10]  Chris Cornelis,et al.  Neighborhood operators for covering-based rough sets , 2016, Inf. Sci..

[11]  Taghi M. Khoshgoftaar,et al.  Stability of Filter- and Wrapper-Based Feature Subset Selection , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[12]  Bingren Xiang,et al.  Rough set based wavelength selection in near-infrared spectral analysis , 2013 .

[13]  Zdzislaw Pawlak,et al.  Rough sets and intelligent data analysis , 2002, Inf. Sci..

[14]  Jana Novovicová,et al.  Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[16]  A. Skidmore,et al.  A hyperspectral band selector for plant species discrimination , 2007 .

[17]  Dong Wang,et al.  Successive projections algorithm combined with uninformative variable elimination for spectral variable selection , 2008 .

[18]  Huan Liu,et al.  Feature Selection for Clustering: A Review , 2018, Data Clustering: Algorithms and Applications.

[19]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[20]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[21]  Chris H. Q. Ding,et al.  Stable feature selection via dense feature groups , 2008, KDD.

[22]  Wenjian Wang,et al.  An active learning-based SVM multi-class classification model , 2015, Pattern Recognit..

[23]  Kezhu Tan,et al.  Neighborhood mutual information and its application on hyperspectral band selection for classification , 2016 .

[24]  Taghi M. Khoshgoftaar,et al.  Gene selection stability's dependence on dataset difficulty , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[25]  Yong He,et al.  Theory and application of near infrared reflectance spectroscopy in determination of food quality , 2007 .

[26]  Zhen Xu,et al.  Hyperspectral band selection based on consistency-measure of neighborhood rough set theory , 2016 .

[27]  Huan Liu,et al.  A Dilemma in Assessing Stability of Feature Selection Algorithms , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[28]  Paulo Henrique Gonçalves Dias Diniz,et al.  Using iSPA-PLS and NIR spectroscopy for the determination of total polyphenols and moisture in commercial tea samples , 2015 .

[29]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[30]  Taghi M. Khoshgoftaar,et al.  A review of the stability of feature selection techniques for bioinformatics data , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[31]  Jianbin Qiu,et al.  A novel approach to hyperspectral band selection based on spectral shape similarity analysis and fast branch and bound search , 2014, Eng. Appl. Artif. Intell..

[32]  Partha Garai,et al.  On fuzzy-rough attribute selection: Criteria of Max-Dependency, Max-Relevance, Min-Redundancy, and Max-Significance , 2013, Appl. Soft Comput..

[33]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[34]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[35]  Taghi M. Khoshgoftaar,et al.  A survey of stability analysis of feature subset selection techniques , 2013, 2013 IEEE 14th International Conference on Information Reuse & Integration (IRI).

[36]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[37]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[38]  Lei Yu,et al.  Stable and Accurate Feature Selection , 2009, ECML/PKDD.

[39]  Giorgia Foca,et al.  Fast exploration and classification of large hyperspectral image datasets for early bruise detection on apples , 2015 .

[40]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[41]  P. Cunningham,et al.  Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection , 2002 .