Bayesian classification for bivariate normal gene expression

A Bayesian optimal screening method (BOSc) is proposed to classify an individual into one of two groups, based on the observation of pairs of covariates, namely the expression level of pairs of genes (previously selected by a specific method, among the thousands of genes present in the microarray) measured using DNA microarrays technology. The method is general and can be applied to any correlated pair of screening variables, either with a bivariate normal distribution or which can be transformed into a bivariate normal. Results on microarray data sets (Leukemia, Prostate and Breast) show that BOSc performance is competitive with, and in some cases significantly better than, quadratic and linear discriminant analyses and support vector machines classifiers. BOSc provides flexible parametric decision rules. Finally, the screening classifier allows the calculation of operating characteristics while addressing information about the prevalence of the disease or type of disease, which is an advantage over other classification methods.

[1]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[2]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[5]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[6]  Danh V. Nguyen,et al.  On partial least squares dimension reduction for microarray-based classification: a simulation study , 2004, Comput. Stat. Data Anal..

[7]  K. F. Turkman,et al.  Optimal Screening Methods , 1989 .

[8]  Jie Li,et al.  A new classification model with simple decision rule for discovering optimal feature gene pairs , 2007, Comput. Biol. Medicine.

[9]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[10]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[11]  Ian R. Dunsmore,et al.  Screening in a Normal Model , 1986 .

[12]  David M. Rocke,et al.  Dimension Reduction for Classification with Gene Expression Microarray Data , 2006, Statistical applications in genetics and molecular biology.

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[15]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[16]  Robert Piessens,et al.  Quadpack: A Subroutine Package for Automatic Integration , 2011 .

[17]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[18]  T. H. Bø,et al.  New feature subset selection procedures for classification of expression profiles , 2002, Genome Biology.

[19]  Insuk Sohn,et al.  Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine , 2009, Comput. Stat. Data Anal..

[20]  John Aitchison,et al.  Statistical Prediction Analysis , 1975 .