A high-dimensional classification approach based on class-dependent feature subspace

Purpose The purpose of this paper is to build a compact and accurate classifier for high-dimensional classification. Design/methodology/approach A classification approach based on class-dependent feature subspace (CFS) is proposed. CFS is a class-dependent integration of a support vector machine (SVM) classifier and associated discriminative features. For each class, our genetic algorithm (GA)-based approach evolves the best subset of discriminative features and SVM classifier simultaneously. To guarantee convergence and efficiency, the authors customize the GA in terms of encoding strategy, fitness evaluation, and genetic operators. Findings Experimental studies demonstrated that the proposed CFS-based approach is superior to other state-of-the-art classification algorithms on UCI data sets in terms of both concise interpretation and predictive power for high-dimensional data. Research limitations/implications UCI data sets rather than real industrial data are used to evaluate the proposed approach. In addition, only single-label classification is addressed in the study. Practical implications The proposed method not only constructs an accurate classification model but also obtains a compact combination of discriminative features. It is helpful for business makers to get a concise understanding of the high-dimensional data. Originality/value The authors propose a compact and effective classification approach for high-dimensional data. Instead of the same feature subset for all the classes, the proposed CFS-based approach obtains the optimal subset of discriminative feature and SVM classifier for each class. The proposed approach enhances both interpretability and predictive power for high-dimensional data.

[1]  Qiang Shen,et al.  New Approaches to Fuzzy-Rough Feature Selection , 2009, IEEE Transactions on Fuzzy Systems.

[2]  Jens Allmer,et al.  Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants , 2016, Adv. Bioinformatics.

[3]  Jingcheng Wang,et al.  Neighborhood effective information ratio for hybrid feature subset evaluation and selection , 2013, Neurocomputing.

[4]  Zne-Jung Lee,et al.  Parameter determination of support vector machine and feature selection using simulated annealing approach , 2008, Appl. Soft Comput..

[5]  Cheng-Lung Huang,et al.  A distributed PSO-SVM hybrid system with feature selection and parameter optimization , 2008, Appl. Soft Comput..

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  M. Esmel ElAlami A filter model for feature subset selection based on genetic algorithm , 2009, Knowl. Based Syst..

[9]  Francisco José Madrid-Cuevas,et al.  Characterization of empirical discrepancy evaluation measures , 2004, Pattern Recognit. Lett..

[10]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Zhongsheng Hua,et al.  Predicting corporate financial distress based on integration of support vector machine and logistic regression , 2007, Expert Syst. Appl..

[12]  Qingyu Zhang,et al.  Big data analytics with swarm intelligence , 2016, Ind. Manag. Data Syst..

[13]  Minqiang Li,et al.  A hybrid classification algorithm based on coevolutionary EBFNN and domain covering method , 2009, Neural Computing and Applications.

[14]  Gary Geunbae Lee,et al.  Information gain and divergence-based feature selection for machine learning-based text categorization , 2006, Inf. Process. Manag..

[15]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[16]  Ching Y. Suen,et al.  Analysis of Class Separation and Combination of Class-Dependent Features for Handwriting Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[18]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[19]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[20]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[21]  Dun-Wei Gong,et al.  Feature selection algorithm based on bare bones particle swarm optimization , 2015, Neurocomputing.

[22]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[23]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  Muhammad A. Aziz Predicting corporate financial distress in UK , 2007 .

[26]  Feng Chu,et al.  A General Wrapper Approach to Selection of Class-Dependent Features , 2008, IEEE Transactions on Neural Networks.

[27]  Panos M. Pardalos,et al.  hGA: Hybrid genetic algorithm in fuzzy rule-based classification systems for high-dimensional problems , 2012, Appl. Soft Comput..

[28]  Zhiquan Wang,et al.  Recognition of human activities using SVM multi-class classifier , 2010, Pattern Recognit. Lett..

[29]  Harris Wu,et al.  Principal Association Mining: An efficient classification approach , 2014, Knowl. Based Syst..

[30]  Lei Liu,et al.  Ensemble gene selection for cancer classification , 2010, Pattern Recognit..

[31]  Kashif Javed,et al.  Feature Selection Based on Class-Dependent Densities for High-Dimensional Binary Data , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32]  Xiangyang Xue,et al.  A simplified multi-class support vector machine with reduced dual optimization , 2012, Pattern Recognit. Lett..

[33]  Qinghua Hu,et al.  Mixed feature selection based on granulation and approximation , 2008, Knowl. Based Syst..

[34]  Simon C. K. Shiu,et al.  Unsupervised feature selection by regularized self-representation , 2015, Pattern Recognit..

[35]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[36]  Hui Li,et al.  Statistics-based wrapper for feature selection: An implementation on financial distress identification with support vector machine , 2014, Appl. Soft Comput..

[37]  Jinkun Chen,et al.  Feature selection via neighborhood multi-granulation fusion , 2014, Knowl. Based Syst..

[38]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[39]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .