Bayesian classification of tumours by using gene expression data

Summary.  Precise classification of tumours is critical for the diagnosis and treatment of cancer. Diagnostic pathology has traditionally relied on macroscopic and microscopic histology and tumour morphology as the basis for the classification of tumours. Current classification frameworks, however, cannot discriminate between tumours with similar histopathologic features, which vary in clinical course and in response to treatment. In recent years, there has been a move towards the use of complementary deoxyribonucleic acid microarrays for the classi‐fication of tumours. These high throughput assays provide relative messenger ribonucleic acid expression measurements simultaneously for thousands of genes. A key statistical task is to perform classification via different expression patterns. Gene expression profiles may offer more information than classical morphology and may provide an alternative to classical tumour diagnosis schemes. The paper considers several Bayesian classification methods based on reproducing kernel Hilbert spaces for the analysis of microarray data. We consider the logistic likelihood as well as likelihoods related to support vector machine models. It is shown through simulation and examples that support vector machine models with multiple shrinkage parameters produce fewer misclassification errors than several existing classical methods as well as Bayesian methods based on the logistic likelihood or those involving only one shrinkage parameter.

[1]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  E. Parzen STATISTICAL INFERENCE ON TIME SERIES BY RKHS METHODS. , 1970 .

[5]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[6]  J. Bernardo Expected Information as Expected Utility , 1979 .

[7]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[8]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[9]  John Skilling,et al.  Maximum Entropy and Bayesian Methods , 1989 .

[10]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[11]  G. Wahba Spline models for observational data , 1990 .

[12]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[13]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[14]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[15]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[16]  Sylvia Richardson,et al.  Inference and monitoring convergence , 1995 .

[17]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[18]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[19]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[20]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[21]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[22]  David Barber,et al.  Bayesian Classification With Gaussian Processes , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[26]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[27]  Christopher M. Bishop,et al.  Variational Relevance Vector Machines , 2000, UAI.

[28]  E. Boerwinkle,et al.  Computational methods for gene expression-based tumor classification. , 2000, BioTechniques.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  I. Mian,et al.  Analysis of molecular profile data using generative and discriminative methods. , 2000, Physiological genomics.

[32]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[33]  J. Sudbø,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[34]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[35]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[36]  Mário A. T. Figueiredo Adaptive Sparseness Using Jeffreys Prior , 2001, NIPS.

[37]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[38]  Christopher Holmes,et al.  Bayesian Methods for Nonlinear Classification and Regressing , 2002 .

[39]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[40]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[41]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[42]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[43]  G. Wahba,et al.  Optimal Properties and Adaptive Tuning of Standard and Nonstandard Support Vector Machines , 2003 .

[44]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[45]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.