Towards the Enhancement of Gene Selection Performance

In a microarray dataset, the expression profiles of a large amount of genes are recorded. Identifying the influential genes from these genes is one of main research topics of bioinformatics and has drawn many attentions. In this chapter, we briefly overview the existing gene selection approaches and summarize the main challenges of gene selection. After that, we detail the strategies to address these challenges. Also, using a typical gene selection model as example, we show the implementation of these strategies and evaluate their contributions.

[1]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[2]  D. Altshuler,et al.  Common genetic variation in IGF1 and prostate cancer risk in the Multiethnic Cohort. , 2006, Journal of the National Cancer Institute.

[3]  Xiaoxing Liu,et al.  An Entropy-based gene selection method for cancer classification using microarray data , 2005, BMC Bioinformatics.

[4]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[5]  Michael Q. Zhang,et al.  Profiling alternatively spliced mRNA isoforms for prostate cancer classification , 2006, BMC Bioinformatics.

[6]  L. Mendonça-Hagler,et al.  Trends in biotechnology and biosafety in Brazil. , 2008, Environmental biosafety research.

[7]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[8]  Andrew Kusiak,et al.  Data mining and genetic algorithm based gene/SNP selection , 2004, Artif. Intell. Medicine.

[9]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[10]  I. Mian,et al.  Identifying marker genes in transcription profiling data using a mixture of feature relevance experts. , 2001, Physiological genomics.

[11]  R. Getzenberg,et al.  Fingerprinting the diseased prostate: Associations between BPH and prostate cancer , 2004, Journal of cellular biochemistry.

[12]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[13]  C. Lopes,et al.  Aberrant cellular retinol binding protein 1 (CRBP1) gene expression and promoter methylation in prostate cancer , 2004, Journal of Clinical Pathology.

[14]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[15]  Robert P. W. Duin,et al.  K-nearest Neighbors Directed Noise Injection in Multilayer Perceptron Training , 2000, IEEE Trans. Neural Networks Learn. Syst..

[16]  Wentian Li,et al.  How Many Genes are Needed for a Discriminant Microarray Data Analysis , 2001, physics/0104029.

[17]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[18]  Edward R. Dougherty,et al.  Superior feature-set ranking for small samples using bolstered error estimation , 2005, Bioinform..

[19]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[20]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[21]  R Ekins,et al.  Microarrays: their origins and applications. , 1999, Trends in biotechnology.

[22]  M. Deriche,et al.  Optimal feature selection using information maximisation: case of biomedical data , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[23]  Rónán Daly,et al.  Inferring gene regulatory networks from classified microarray data: Initial results , 2005, BMC Bioinformatics.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[25]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[26]  Simon Lin,et al.  Methods of microarray data analysis III , 2002 .

[27]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[28]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[30]  J. Deddens,et al.  Integrin-linked kinase expression increases with prostate tumor grade. , 2001, Clinical cancer research : an official journal of the American Association for Cancer Research.

[31]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[32]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[33]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[34]  Marco Sciandrone,et al.  Efficient training of RBF neural networks for pattern recognition , 2001, IEEE Trans. Neural Networks.

[35]  Tommy W. S. Chow,et al.  Efficiently searching the important input variables using Bayesian discriminant , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[36]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..