Effective Gene Selection Method With Small Sample Sets Using Gradient-Based and Point Injection Techniques

Microarray gene expression data usually consist of a large amount of genes. Among these genes, only a small fraction is informative for performing cancer diagnostic test. This paper focuses on effective identification of informative genes. We analyze gene selection models from the perspective of optimization theory. As a result, a new strategy is designed to modify conventional search engines. Also, as overfitting is likely to occur in microarray data because of their small sample set, a point injection technique is developed to address the problem of overfitting. The proposed strategies have been evaluated on three kinds of cancer diagnosis. Our results show that the proposed strategies can improve the performance of gene selection substantially. The experimental results also indicate that the proposed methods are very robust under all the investigated cases.

[1]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[2]  Vladimir Pavlovic,et al.  RankGene: identification of diagnostic genes based on expression data , 2003, Bioinform..

[3]  C. Lopes,et al.  Aberrant cellular retinol binding protein 1 (CRBP1) gene expression and promoter methylation in prostate cancer , 2004, Journal of Clinical Pathology.

[4]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Tommy W. S. Chow,et al.  Efficient selection of discriminative genes from microarray gene expression data for cancer diagnosis , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[8]  J. Deddens,et al.  Integrin-linked kinase expression increases with prostate tumor grade. , 2001, Clinical Cancer Research.

[9]  R Ekins,et al.  Microarrays: their origins and applications. , 1999, Trends in biotechnology.

[10]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[11]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[12]  I. Mian,et al.  Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. , 2001, Physiological genomics.

[13]  M. Deriche,et al.  Optimal feature selection using information maximisation: case of biomedical data , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[14]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[15]  Edward R. Dougherty,et al.  Superior feature-set ranking for small samples using bolstered error estimation , 2005, Bioinform..

[16]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[17]  Andrew Kusiak,et al.  Data mining and genetic algorithm based gene/SNP selection , 2004, Artif. Intell. Medicine.

[18]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[19]  D. Altshuler,et al.  Common genetic variation in IGF1 and prostate cancer risk in the Multiethnic Cohort. , 2006, Journal of the National Cancer Institute.

[20]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[21]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[22]  Michael Q. Zhang,et al.  Profiling alternatively spliced mRNA isoforms for prostate cancer classification , 2006, BMC Bioinformatics.

[23]  R. Getzenberg,et al.  Fingerprinting the diseased prostate: Associations between BPH and prostate cancer , 2004, Journal of cellular biochemistry.

[24]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[25]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[26]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[27]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[28]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[29]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[30]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[31]  Robert P. W. Duin,et al.  K-nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training , 2000, IEEE Trans. Neural Networks Learn. Syst..

[32]  Marco Sciandrone,et al.  Efficient training of RBF neural networks for pattern recognition , 2001, IEEE Trans. Neural Networks.

[33]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..