SVM-BT-RFE: An improved gene selection framework using Bayesian T-test embedded in support vector machine (recursive feature elimination) algorithm

Abstract Gene Regulatory Network (GRN) has always gained considerable attention from bioinformaticians and system biologists in understanding the biological process. But the foremost difficulty relics to appropriately select a stuff for its expression. An elementary requirement stage in the framework is mining relevant and informative genes to achieve distinguishable biological facts. In an endeavor to discover these genes in several datasets, we have suggested a strategic gene selection algorithm called Support Vector Machine Bayesian T-Test Recursive Feature Elimination algorithm (SVM-BT-RFE), which is an extended variation of support vector machine recursive feature elimination (SVM-RFE) algorithm and support vector machine t-test recursive feature elimination (SVM-T-RFE). Our algorithm accomplishes the goal of attaining maximum classification accuracy with smaller subsets of gene sets of high dimensional data. Each dataset is said to contain approximately 5000–40,000 genes out of which a subset of genes can be selected that delivers the highest level of classification accuracy. The proposed SVM-BT-RFE algorithm was also compared to the existing SVM-T-RFE and SVM-RFE where it was found that the proposed algorithm outshined than the latter. The proposed SVM-BT-RFE technique have provided an improvement of approximately 25% as compared to the existing SVM-T-RFE and more than 40% of improvement as compared to the existing SVM-RFE. The comparison was performed with regard to the classification accuracy based on the number of genes selected and classification error rate of 5 runs of the algorithm.

[1]  D. Pham,et al.  Statistical approach to normalization of feature vectors and clustering of mixed datasets , 2012, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[2]  Marco Vannucci,et al.  A Hybrid Feature Selection Method for Classification Purposes , 2014, 2014 European Modelling Symposium.

[3]  Hala Alshamlan,et al.  mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling , 2015, BioMed research international.

[4]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[5]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Michael K. Ng,et al.  Feature weight estimation for gene selection: a local hyperlinear learning approach , 2014, BMC Bioinformatics.

[7]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[8]  Taghi M. Khoshgoftaar,et al.  Evaluation of Wrapper-Based Feature Selection Using Hard, Moderate, and Easy Bioinformatics Data , 2014, 2014 IEEE International Conference on Bioinformatics and Bioengineering.

[9]  Salwani Abdullah,et al.  Hybridising harmony search with a Markov blanket for gene selection problems , 2014, Inf. Sci..

[10]  Alvis Brazma,et al.  Current approaches to gene regulatory network modelling , 2007, BMC Bioinformatics.

[11]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[12]  Hugh Sweatman,et al.  Persistence and Change in Community Composition of Reef Corals through Present, Past, and Future Climates , 2014, PloS one.

[13]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[14]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[15]  Ke Liu,et al.  The sensitivity and significance analysis of parameters in the model of pH regulation on lactic acid production by Lactobacillus bulgaricus , 2014, BMC Bioinformatics.

[16]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[17]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[18]  S. Teichmann,et al.  Evolution of transcription factors and the gene regulatory network in Escherichia coli. , 2003, Nucleic acids research.

[19]  E. Blackburn,et al.  Telomeres and telomerase: their mechanisms of action and the effects of altering their functions , 2005, FEBS letters.

[20]  Bernhard Schölkopf,et al.  Support Vector Machine Applications in Computational Biology , 2004 .

[21]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[22]  M. Lai,et al.  SVM-T-RFE: a novel gene selection algorithm for identifying metastasis-related genes in colorectal cancer using gene expression profiles. , 2012, Biochemical and biophysical research communications.

[23]  Yungho Leu,et al.  A novel hybrid feature selection method for microarray data analysis , 2011, Appl. Soft Comput..

[24]  Rajeev Srivastava,et al.  Filter vs. Wrapper approach for optimum gene selection of high dimensional gene expression dataset: An analysis with cancer datasets , 2014, 2014 International Conference on High Performance Computing and Applications (ICHPCA).

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[27]  Hala M. Alshamlan,et al.  The Performance of Bio-Inspired Evolutionary Gene Selection Methods for Cancer Classification Using Microarray Dataset , 2014 .

[28]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[29]  Fei Han,et al.  A Novel Strategy for Gene Selection of Microarray Data Based on Gene-to-Class Sensitivity Information , 2014, PloS one.

[30]  Anju Mishra,et al.  A Survey on Different Feature Selection Methods for Microarray Data Analysis , 2013 .

[31]  Xiaosheng Wang,et al.  A Robust Gene Selection Method for Microarray-based Cancer Classification , 2010, Cancer informatics.

[32]  Alvis Brazma,et al.  Modelling gene networks at different organisational levels , 2005, FEBS letters.

[33]  Lipo Wang,et al.  A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data , 2008, Genom. Proteom. Bioinform..

[34]  Paul Horton,et al.  Network-based de-noising improves prediction from microarray data , 2006, BMC Bioinformatics.

[35]  Zhen Lin,et al.  Choosing SNPs using feature selection , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).