Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis

Classification of patient samples is a crucial aspect of cancer diagnosis. DNA hybridization arrays simultaneously measure the expression levels of thousands of genes and it has been suggested that gene expression may provide the additional information needed to improve cancer classification and diagnosis. This paper presents methods for analyzing gene expression data to classify cancer types. Machine learning techniques, such as Bayesian networks, neural trees, and radial basis function (RBF) networks, are used for the analysis of the CAMDA Data Set 2. These techniques have their own properties including the ability of finding important genes for cancer classification, revealing relationships among genes, and classifying cancer. This paper reports on comparative evaluation of the experimental results of these methods.

[1]  Finn Verner Jensen,et al.  Introduction to Bayesian Networks , 2008, Innovations in Bayesian Networks.

[2]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[5]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[6]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[7]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[8]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[9]  Byoung-Tak Zhang,et al.  System identification using evolutionary Markov chain Monte Carlo , 2001, J. Syst. Archit..

[10]  Nir Friedman,et al.  Data Analysis with Bayesian Networks: A Bootstrap Approach , 1999, UAI.

[11]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[12]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[13]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Jill P. Mesirov,et al.  Class prediction and discovery using gene expression data , 2000, RECOMB '00.

[16]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[17]  John C. Platt A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.

[18]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[19]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[20]  Byoung-Tak Zhang,et al.  Evolutionary Induction of Sparse Neural Trees , 1997, Evolutionary Computation.

[21]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[22]  Byoung-Tak Zhang,et al.  Effects of Occam's Razor in Evolving Sigma-Pi Neural Nets , 1994, PPSN.