A hybrid approach for gene selection and classification using support vector machine

Deoxyribo Nucleic Acid (DNA) microarray technology allows us to generate thousands of gene expression in a single chip. Analyzing gene expression data plays vital role in understanding diseases and discovering medicines. Classification of cancer based on gene expression data is a promising research area in the field of bioinformatics and data mining. All genes do not contribute for efficient classification of samples. Hence, a robust feature selection method is required to identify the relevant genes which help in the classification of samples effectively. Most of the existing feature selection methods are computationally expensive. Redundancy in gene expression data leads to poor classification accuracy and also acts bad on multi class classification. This paper proposes an ensemble feature selection technique which is a combination of Recursive Feature Elimination (RFE) and Based Bayes error Filter (BBF) for gene selection and Support Vector Machine (SVM) algorithm for classification. The proposed ensemble gene selection method yields comparable performance on classification when compared to existing classifiers and provides a new insight in feature selection.

[1]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[2]  Guorong Xuan,et al.  Feature Selection based on the Bhattacharyya Distance , 2006, ICPR.

[3]  Jung-Hsien Chiang,et al.  A Combination of Rough-Based Feature Selection and RBF Neural Network for Classification Using Gene Expression Data , 2008, IEEE Transactions on NanoBioscience.

[4]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[5]  Zhang Hui,et al.  Wrapper Feature Extraction for Time Series Classification Using Singular Value Decomposition , 2005 .

[6]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[7]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[8]  Xiusheng Duan,et al.  Improved SVM-RFE feature selection method for multi-SVM classifier , 2011, 2011 International Conference on Electrical and Control Engineering.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[13]  Jian Li,et al.  Feature selection based on bayes minimum error probability , 2012, 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery.

[14]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[15]  P. Réfrégier,et al.  Bhattacharyya distance as a contrast parameter for statistical processing of noisy optical images. , 2004, Journal of the Optical Society of America. A, Optics, image science, and vision.

[16]  Kunjithapatham Meena,et al.  Gender classification in speech recognition using fuzzy logic and neural network , 2013, Int. Arab J. Inf. Technol..

[17]  Hong Peng,et al.  Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Huan Liu,et al.  Redundancy based feature selection for microarray data , 2004, KDD.

[19]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.