Ensemble of feature selection methods: A hesitant fuzzy sets approach

Display Omitted In this paper, hesitant fuzzy sets are utilized for representing ensemble of ranking algorithms (as a relevancy measure) and ensemble of similarity measures (as a redundancy merit) for feature subset selection.In this paper, the well-known CFS merit has been fuzzified with ensemble of feature ranking algorithms and similarity measures.The proposed MRMR-HFS is recmmended when one deals with high dimensional datasets which suffer from small sample sizes. Moreover, it can be used when speed of feature selection process is matter.The proposed MRMR-HFS method can be used when the search space is extremely large and cannot be searched by meta-heuristic algorithms.Several experimental results as well as non-parametric statistical tests confirm the performance of our MRMR-HFS method in the field of feature selection. Recently, there has been a great attention to develop feature selection methods on the microarray high dimensional datasets. In this paper, an innovative method based on Maximum Relevancy and Minimum Redundancy (MRMR) approach by using Hesitant Fuzzy Sets (HFSs) is proposed to deal with feature subset selection; the method is called MRMR-HFS. MRMR-HFS is a novel filter-based feature selection algorithm that selects features by ensemble of ranking algorithms (as the measure of feature-class relevancy that must be maximized) and similarity measures (as the measure of feature-feature redundancy that must be minimized). The combination of ranking algorithms and similarity measures are done by using the fundamental concepts of information energies of HFSs. The proposed method has been inspired from Correlation based Feature Selection (CFS) within the sequential forward search in order to present a robust feature selection tool to solve high dimensional problems. To evaluate the effectiveness of the MRMR-HFS, several experimental results are carried out on nine well-known microarray high dimensional datasets. The obtained results are compared with those of other similar state-of-the-art algorithms including Correlation-based Feature Selection (CFS), Fast Correlation-based Filter (FCBF), Intract (INT), and Maximum Relevancy Minimum Redundancy (MRMR). The outcomes of comparison carried out via some non-parametric statistical tests confirm that the MRMR-HFS is effective for feature subset selection in high dimensional datasets in terms of accuracy, sensitivity, specificity, G-mean, and number of selected features.

[1]  Antônio de Pádua Braga,et al.  GA-KDE-Bayes: an evolutionary wrapper method based on non-parametric density estimation applied to bioinformatics problems , 2013, ESANN.

[2]  Jerry M. Mendel,et al.  Operations on type-2 fuzzy sets , 2001, Fuzzy Sets Syst..

[3]  Francisco Herrera,et al.  Hesitant Fuzzy Sets: State of the Art and Future Directions , 2014, Int. J. Intell. Syst..

[4]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[5]  Jun Ye,et al.  Cosine similarity measures for intuitionistic fuzzy sets and their applications , 2011, Math. Comput. Model..

[6]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[7]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[8]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[9]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[10]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[11]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[12]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[13]  A. Neumaier Complete search in continuous global optimization and constraint satisfaction , 2004, Acta Numerica.

[14]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[15]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[16]  Francisco Herrera,et al.  Study on the Impact of Partition-Induced Dataset Shift on $k$-Fold Cross-Validation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Huan Liu,et al.  Searching for Interacting Features , 2007, IJCAI.

[18]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[19]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[20]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[21]  Mohammad Kazem Ebrahimpour,et al.  Proposing a novel feature selection algorithm based on Hesitant Fuzzy Sets and correlation concepts , 2015, 2015 The International Symposium on Artificial Intelligence and Signal Processing (AISP).

[22]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[23]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Lorenzo Bruzzone,et al.  A new search algorithm for feature selection in hyperspectral remote sensing images , 2001, IEEE Trans. Geosci. Remote. Sens..

[25]  Na Chen,et al.  Correlation coefficients of hesitant fuzzy sets and their applications to clustering analysis , 2013 .

[26]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[27]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[28]  Hui-Huang Hsu,et al.  Hybrid feature selection by combining filters and wrappers , 2011, Expert Syst. Appl..

[29]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.

[30]  Salwani Abdullah,et al.  Hybridizing relieff, mRMR filters and GA wrapper approaches for gene selection , 2012 .

[31]  Krassimir T. Atanassov,et al.  Intuitionistic fuzzy sets , 1986 .

[32]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[33]  João Paulo Papa,et al.  Feature selection through gravitational search algorithm , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Mohammad Kazem Ebrahimpour,et al.  Feature subset selection using Information Energy and correlation coefficients of hesitant fuzzy sets , 2015, 2015 7th Conference on Information and Knowledge Technology (IKT).

[35]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[36]  V. Torra,et al.  A framework for linguistic logic programming , 2010 .

[37]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..