The application of naive Bayes model averaging to predict Alzheimer's disease from genome-wide data

OBJECTIVE Predicting patient outcomes from genome-wide measurements holds significant promise for improving clinical care. The large number of measurements (eg, single nucleotide polymorphisms (SNPs)), however, makes this task computationally challenging. This paper evaluates the performance of an algorithm that predicts patient outcomes from genome-wide data by efficiently model averaging over an exponential number of naive Bayes (NB) models. DESIGN This model-averaged naive Bayes (MANB) method was applied to predict late onset Alzheimer's disease in 1411 individuals who each had 312,318 SNP measurements available as genome-wide predictive features. Its performance was compared to that of a naive Bayes algorithm without feature selection (NB) and with feature selection (FSNB). MEASUREMENT Performance of each algorithm was measured in terms of area under the ROC curve (AUC), calibration, and run time. RESULTS The training time of MANB (16.1 s) was fast like NB (15.6 s), while FSNB (1684.2 s) was considerably slower. Each of the three algorithms required less than 0.1 s to predict the outcome of a test case. MANB had an AUC of 0.72, which is significantly better than the AUC of 0.59 by NB (p<0.00001), but not significantly different from the AUC of 0.71 by FSNB. MANB was better calibrated than NB, and FSNB was even better in calibration. A limitation was that only one dataset and two comparison algorithms were included in this study. CONCLUSION MANB performed comparatively well in predicting a clinical outcome from a high-dimensional genome-wide dataset. These results provide support for including MANB in the methods used to predict outcomes from large, genome-wide datasets.

[1]  H. Warner,et al.  A mathematical approach to medical diagnosis. Application to congenital heart disease. , 1961, JAMA.

[2]  Homer R. Warner,et al.  A Mathematical Approach to Medical Diagnosis , 1961 .

[3]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[4]  D. Madigan,et al.  Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam's Window , 1994 .

[5]  Adrian E. Raftery,et al.  Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance , 1995 .

[6]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[7]  Henry Tirri,et al.  On Supervised Selection of Bayesian Networks , 1999, UAI.

[8]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[9]  Paul N. Bennett Assessing the Calibration of Naive Bayes Posterior Estimates , 2000 .

[10]  PosteriorEstimatesPaul N. BennettSeptember Assessing the Calibration of Naive Bayes ' , 2000 .

[11]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[12]  Gregory F. Cooper,et al.  Exact model averaging with naive Bayesian classifiers , 2002, ICML.

[13]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[14]  Gregory F. Cooper,et al.  Model Averaging for Prediction with Discrete Bayesian Networks , 2004, J. Mach. Learn. Res..

[15]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[16]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[17]  M. Goedert,et al.  A Century of Alzheimer's Disease , 2006, Science.

[18]  Winnie S. Liang,et al.  GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. , 2007, Neuron.

[19]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[20]  D. Blacker,et al.  Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database , 2007, Nature Genetics.

[21]  Winnie S. Liang,et al.  GAB2 Alleles Modify Alzheimer's Risk in APOE ɛ4 Carriers , 2007, Neuron.

[22]  E. Reiman In this issue: entering the era of high-density genome-wide association studies. , 2007, Journal of Clinical Psychiatry.

[23]  Rebecca F. Halperin,et al.  A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. , 2007, The Journal of clinical psychiatry.

[24]  M. Spitz,et al.  Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[25]  A. J. Slater,et al.  Candidate single-nucleotide polymorphisms from a genomewide association study of Alzheimer disease. , 2008, Archives of neurology.

[26]  K. Frazer,et al.  Common vs. rare allele hypotheses for complex diseases. , 2009, Current opinion in genetics & development.

[27]  P. Visscher,et al.  On Jim Watson's APOE status: genetic information is hard to hide , 2009, European Journal of Human Genetics.

[28]  D. Avramopoulos Genetics of Alzheimer's disease: recent advances , 2009, Genome Medicine.

[29]  S. Bassett,et al.  Fine mapping of the chromosome 10q11-q21 linkage region in Alzheimer's disease cases and controls , 2010, neurogenetics.

[30]  Kai Yu,et al.  Approaches for Evaluating Rare Polymorphisms in Genetic Association Studies , 2010, Human Heredity.

[31]  R. Tanzi,et al.  The Genetics of Alzheimer Disease: Back to the Future , 2010, Neuron.

[32]  Y. Pawitan,et al.  The pursuit of genome-wide association studies: where are we now? , 2010, Journal of Human Genetics.

[33]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[34]  G. Cooper,et al.  An efficient bayesian method for predicting clinical outcomes from genome-wide data. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[35]  K. Blennow,et al.  Pleiotropy in the presence of allelic heterogeneity: alternative genetic models for the influence of APOE on serum LDL, CSF amyloid-β42, and dementia. , 2010, Journal of Alzheimer's disease : JAD.