A Sparse-Modeling based approach for Class-Specific feature selection

In this work, we propose a novel Feature Selection framework called Sparse-Modeling BasedApproach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes. Subjects Bioinformatics, Data Mining and Machine Learning, Data Science

[1]  Manuel Mucientes,et al.  STAC: A web platform for the comparison of algorithms using statistical tests , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[2]  Antonino Staiano,et al.  Statistical and Computational Methods for Genetic Diseases: An Overview , 2015, Comput. Math. Methods Medicine.

[3]  Lipo Wang,et al.  A GA-based RBF classifier with class-dependent features , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[4]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[5]  Antonino Staiano,et al.  Association of USF1 and APOA5 polymorphisms with familial combined hyperlipidemia in an Italian population. , 2015, Molecular and cellular probes.

[6]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  T. Lumley,et al.  PRINCIPAL COMPONENT ANALYSIS AND FACTOR ANALYSIS , 2004, Statistical Methods for Biomedical Research.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Michele Pinelli,et al.  Interactive data analysis and clustering of genomic data , 2008, Neural Networks.

[10]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[13]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[14]  Jianzhong Li,et al.  A stable gene selection in microarray data analysis , 2006, BMC Bioinformatics.

[15]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[16]  Antonino Staiano,et al.  Investigation of Single Nucleotide Polymorphisms Associated to Familial Combined Hyperlipidemia with Random Forests , 2012, WIRN.

[17]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[18]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[19]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[20]  Nikola Bogunovic,et al.  A review of feature selection methods with applications , 2015, 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[21]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[22]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[24]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[25]  Jugal K. Kalita,et al.  MIFS-ND: A mutual information-based feature selection method , 2014, Expert Syst. Appl..

[26]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[27]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Antonino Staiano,et al.  A multilayer perceptron neural network-based approach for the identification of responsiveness to interferon therapy in multiple sclerosis patients , 2010, Inf. Sci..

[30]  Paolo Vineis,et al.  Methylome Analysis and Epigenetic Changes Associated with Menarcheal Age , 2013, PloS one.

[31]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[32]  E. Kreyszig,et al.  Advanced Engineering Mathematics. , 1974 .

[33]  R. Abseher,et al.  Microarray gene expression profiling of B-cell chronic lymphocytic leukemia subgroups defined by genomic aberrations and VH mutation status. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[34]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[36]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[37]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[40]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[41]  Guillermo Sapiro,et al.  Non-local sparse models for image restoration , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42]  José Francisco Martínez Trinidad,et al.  General framework for class-specific feature selection , 2011, Expert Syst. Appl..

[43]  Antonino Staiano,et al.  Probabilistic principal surfaces for yeast gene microarray data mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[44]  Wotao Yin,et al.  Parallel Multi-Block ADMM with o(1 / k) Convergence , 2013, Journal of Scientific Computing.

[45]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[46]  Mark Weiser,et al.  Source Code , 1987, Computer.

[47]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[48]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[49]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[50]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[51]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[52]  Angelo Ciaramella,et al.  Compressive sampling and adaptive dictionary learning for the packet loss recovery in audio multimedia streaming , 2016, Multimedia Tools and Applications.

[53]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[54]  Angelo Ciaramella,et al.  Packet loss recovery in audio multimedia streaming by using compressive sensing , 2016, IET Commun..