Feature selection based on FDA and F-score for multi-class classification

The feature ranking method is discussed based on Fisher discriminate analysis (FDA) and F-score.The relative distribution of different classes is considered in the paper.The method removes all insignificant features at a time, so it can effectively reduce computational cost.The advantages of the proposed method are discussed. F-score is a simple feature selection technique, however, it works only for two classes. This paper proposes a novel feature ranking method based on Fisher discriminate analysis (FDA) and F-score, denoted as FDAF-score, which considers the relative distribution of classes in a multi-dimensional feature space. The main idea is that a proper subset is got according to maximizing the proportion of average between-class distance to the relative within-class scatter. Because the method removes all insignificant features at a time, it can effectively reduce computational cost. Experiments on six benchmarking UCI datasets and two artificial datasets demonstrate that the proposed FDAF-score algorithm can not only obtain good results with fewer features than the original datasets as well as fast computation but also deal with the classification problem with noises well.

[1]  H. Ghassemian,et al.  Feature space discriminant analysis for hyperspectral data feature reduction , 2015 .

[2]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[3]  Ersen Yilmaz An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis , 2013, Comput. Math. Methods Medicine.

[4]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[5]  Zhaohong Deng,et al.  A minimax probabilistic approach to feature transformation for multi-class data , 2013, Appl. Soft Comput..

[6]  Tommy W. S. Chow,et al.  Heterogeneous feature subset selection using mutual information-based feature transformation , 2015, Neurocomputing.

[7]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[8]  Rui Zhang,et al.  A novel feature selection method considering feature interaction , 2015, Pattern Recognit..

[9]  Kemal Polat,et al.  A new feature selection method on classification of medical datasets: Kernel F-score feature selection , 2009, Expert Syst. Appl..

[10]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[14]  Basabi Chakraborty,et al.  A new feature extraction technique for on-line recognition of handwritten alphanumeric characters , 2002, Inf. Sci..

[15]  Kemal Polat,et al.  Multi-class f-score feature selection approach to classification of obstructive sleep apnea syndrome , 2010, Expert Syst. Appl..

[16]  Ravi Kothari,et al.  Adaptive linear dimensionality reduction for classification , 2000, Pattern Recognit..

[17]  Junfeng Gao,et al.  A Novel Approach for Lie Detection Based on F-Score and Extreme Learning Machine , 2013, PloS one.

[18]  Sugunadevi Sakkiah,et al.  Insight the C-Site Pocket Conformational Changes Responsible for Sirtuin 2 Activity Using Molecular Dynamics Simulations , 2013, PloS one.

[19]  Hiroshi Nishiura,et al.  Age-Dependent Estimates of the Epidemiological Impact of Pandemic Influenza (H1N1-2009) in Japan , 2013, Comput. Math. Methods Medicine.

[20]  Kashif Javed,et al.  Machine learning using Bernoulli mixture models: Clustering, rule extraction and dimensionality reduction , 2013, Neurocomputing.

[21]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[22]  Feng-Chia Li,et al.  Combination of feature selection approaches with SVM in credit scoring , 2010, Expert Syst. Appl..

[23]  Gregory Ditzler,et al.  Fizzy: feature subset selection for metagenomics , 2015, BMC Bioinformatics.

[24]  Rommel M. Barbosa,et al.  Classification of geographic origin of rice by data mining and inductively coupled plasma mass spectrometry , 2016, Comput. Electron. Agric..

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hemanta Kumar Bhuyan,et al.  Privacy preserving sub-feature selection in distributed data mining , 2015, Appl. Soft Comput..

[28]  Chih-Jen Lin,et al.  Chapter 12 Combining SVMs with Various Feature Selection Strategies , 2006 .

[29]  Seoung Bum Kim,et al.  Unsupervised feature selection using weighted principal components , 2011, Expert Syst. Appl..

[30]  Seoungyoul Oh,et al.  Feature selection based on geometric distance for high-dimensional data , 2016 .

[31]  Peng Zhang,et al.  Characterizing and Modeling the Dynamics of Activity and Popularity , 2013, PloS one.

[32]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[33]  Ming Yang,et al.  Discriminative cost sensitive Laplacian score for face recognition , 2015, Neurocomputing.

[34]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[35]  Junfeng Gao,et al.  A Novel Algorithm to Enhance P300 in Single Trials: Application to Lie Detection Using F-Score and SVM , 2014, PloS one.

[36]  Bor-Chen Kuo,et al.  Kernel Nonparametric Weighted Feature Extraction for Hyperspectral Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[37]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bernhard O. Palsson,et al.  Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions , 2000, BMC Bioinformatics.

[39]  Fang Liu,et al.  Unsupervised feature selection based on maximum information and minimum redundancy for hyperspectral images , 2016, Pattern Recognit..

[40]  Chao-Ton Su,et al.  Feature selection for the SVM: An application to hypertension diagnosis , 2008, Expert Syst. Appl..

[41]  Jesús S. Aguilar-Ruiz,et al.  Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches , 2012, Expert Syst. Appl..

[42]  R. E. Abdel-Aal,et al.  GMDH-based feature ranking and selection for improved classification of medical data , 2005, J. Biomed. Informatics.