Hybrid feature selection method for biomedical datasets

Currently classifying high-dimensional data is a very challenging problem. High dimensional feature spaces affect both accuracy and efficiency of supervised learning methods. To address this issue, we present a fast and efficient feature selection algorithm to facilitate classifying high-dimensional datasets as those appearing in Bioinformatics problems. Our method employs a Laplacian score ranking to reduce the search space, combined with a simple wrapper strategy to find a good feature subset of uncorrelated features, giving as result a hybrid feature selection method which is useful for high dimensional spaces. Some experiments have been carried out on gene microarray datasets to demonstrate the effectiveness and robustness of the proposed method.

[1]  José Francisco Martínez Trinidad,et al.  Hybrid Feature Selection Method for Supervised Classification Based on Laplacian Score Ranking , 2010, MCPR.

[2]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  Xianchang Wang,et al.  A novel approach to select important genes from microarray data , 2011, 2011 Chinese Control and Decision Conference (CCDC).

[5]  Yimin Wu,et al.  Feature selection for classifying high-dimensional numerical data , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[6]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[7]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[8]  Larry Wasserman,et al.  Challenges in Statistical Machine Learning , 2006 .

[9]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Abdul Rahman Ramli,et al.  Feature selection for high dimensional data: An evolutionary filter approach. , 2011 .

[11]  Robert Tibshirani,et al.  Machine learning methods applied to DNA microarray data can improve the diagnosis of cancer , 2003, SKDD.

[12]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[13]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[14]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[15]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[17]  S. Niijima,et al.  Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19]  Yue Han,et al.  Stable Gene Selection from Microarray Data via Sample Weighting , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Rongcheng Liu,et al.  An Unsupervised Feature Selection Algorithm: Laplacian Score Combined with Distance-Based Entropy Measure , 2009, 2009 Third International Symposium on Intelligent Information Technology Application.

[21]  Su-Fen Chen,et al.  Redundant Feature Selection Based on Hybrid GA and BPSO , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[22]  Yanqing Zhang,et al.  Additive noise analysis on microarray data via SVM classification , 2010, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[23]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[24]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[25]  Fabian Model,et al.  Feature selection for DNA methylation based cancer classification , 2001, ISMB.

[26]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[27]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[28]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[29]  Francesca Odone,et al.  Feature selection for high-dimensional data , 2009, Comput. Manag. Sci..

[30]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[31]  Qiang Cheng,et al.  The Fisher-Markov Selector: Fast Selecting Maximally Separable Feature Subset for Multiclass Classification with Applications to High-Dimensional Data , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[33]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[34]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[35]  Raúl Santos-Rodríguez,et al.  Spectral Clustering and Feature Selection for Microarray Data , 2009, 2009 International Conference on Machine Learning and Applications.

[36]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[37]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[38]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.