Effective Gene Selection Method Using Bayesian Discriminant Based Criterion and Genetic Algorithms

Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic tests. This paper focuses on effective identification of informative genes. A newly developed gene selection criterion using the concept of Bayesian discriminant is used. The criterion measures the classification ability of a feature set. Excellent gene selection results are then made possible. Apart from the cost function, this paper addresses the drawback of conventional sequential forward search (SFS) method. New genetic algorithms based Bayesian discriminant criterion is designed. The proposed strategies have been thoroughly evaluated on three kinds of cancer diagnoses based on the classification results of three typical classifiers which are a multilayer perception model (MLP), a support vector machine model (SVM), and a 3-nearest neighbor rule classifier (3-NN). The obtained results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.

[1]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[2]  Marco Richeldi,et al.  Performing Effective Feature Selection by Investigating the Deep Structure of the Data , 1996, KDD.

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  Qing Yang,et al.  A genetic algorithm applied to optimal gene subset selection , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[5]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[6]  Xue-wen Chen,et al.  Gene selection for cancer classification using bootstrapped genetic algorithms and support vector machines , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  M. Deriche,et al.  Optimal feature selection using information maximisation: case of biomedical data , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[8]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[9]  C. Lopes,et al.  Aberrant cellular retinol binding protein 1 (CRBP1) gene expression and promoter methylation in prostate cancer , 2004, Journal of Clinical Pathology.

[10]  J. Deddens,et al.  Integrin-linked kinase expression increases with prostate tumor grade. , 2001, Clinical Cancer Research.

[11]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[12]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[13]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[14]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[15]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[16]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[17]  I. Mian,et al.  Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. , 2001, Physiological genomics.

[18]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[19]  D. Altshuler,et al.  Common genetic variation in IGF1 and prostate cancer risk in the Multiethnic Cohort. , 2006, Journal of the National Cancer Institute.

[20]  Tommy W. S. Chow,et al.  Efficient selection of discriminative genes from microarray gene expression data for cancer diagnosis , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[21]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[22]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[23]  Xuesong Lu,et al.  Significance of Gene Ranking for Classification of Microarray Samples , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[24]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[25]  Michael Q. Zhang,et al.  Profiling alternatively spliced mRNA isoforms for prostate cancer classification , 2006, BMC Bioinformatics.

[26]  Malay Kumar Kundu,et al.  Self-crossover-a new genetic operator and its application to feature selection , 1998, Int. J. Syst. Sci..

[27]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[28]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[29]  Andrew Kusiak,et al.  Data mining and genetic algorithm based gene/SNP selection , 2004, Artif. Intell. Medicine.

[30]  R. Getzenberg,et al.  Fingerprinting the diseased prostate: Associations between BPH and prostate cancer , 2004, Journal of cellular biochemistry.

[31]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[32]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[33]  R Ekins,et al.  Microarrays: their origins and applications. , 1999, Trends in biotechnology.

[34]  J. M. Deutsch,et al.  Evolutionary algorithms for finding optimal gene sets in microarray prediction , 2003, Bioinform..

[35]  María José del Jesús,et al.  Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems , 2001, Inf. Sci..