Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer's Disease

In the analysis and diagnosis of many diseases, such as the Alzheimer's disease (AD), two important and related tasks are usually required: i) selecting genetic and phenotypical markers for diagnosis, and ii) identifying associations between genetic and phenotypical features. While previous studies treat these two tasks separately, they are tightly coupled due to the same underlying biological basis. To harness their potential benefits for each other, we propose a new sparse Bayesian approach to jointly carry out the two important and related tasks. In our approach, we extract common latent features from different data sources by sparse projection matrices and then use the latent features to predict disease severity levels; in return, the disease status can guide the learning of sparse projection matrices, which not only reveal interactions between data sources but also select groups of related biomarkers. In order to boost the learning of sparse projection matrices, we further incorporate graph Laplacian priors encoding the valuable linkage disequilibrium (LD) information. To efficiently estimate the model, we develop a variational inference algorithm. Analysis on an imaging genetics dataset for AD study shows that our model discovers biologically meaningful associations between single nucleotide polymorphisms (SNPs) and magnetic resonance imaging (MRI) features, and achieves significantly higher accuracy for predicting ordinal AD stages than competitive methods.

[1]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[2]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[3]  Jennifer G. Dy,et al.  Sparse Probabilistic Principal Component Analysis , 2009, AISTATS.

[4]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[5]  Nicola Toschi,et al.  Relevance of magnetic resonance imaging for early detection and diagnosis of Alzheimer disease. , 2013, The Medical clinics of North America.

[6]  Ayse Canan Yazici,et al.  Evaluation of Cerebellar Asymmetry in Alzheimer’s Disease: A Stereological Study , 2009, Dementia and Geriatric Cognitive Disorders.

[7]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[8]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[9]  D. Falconer,et al.  Introduction to Quantitative Genetics. , 1961 .

[10]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[11]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[12]  Katherine A. Heller,et al.  Evaluating Bayesian and L1 Approaches for Sparse Unsupervised Learning , 2011, ICML.

[13]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  A. Børresen-Dale,et al.  The landscape of cancer genes and mutational processes in breast cancer , 2012, Nature.

[17]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[18]  D. Falconer Introduction to quantitative genetics. 1. ed. , 1984 .

[19]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[20]  Li Shen,et al.  Baseline MRI Predictors of Conversion from MCI to Probable AD in the ADNI Cohort , 2009, Current Alzheimer research.

[21]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[22]  Z. Khachaturian Diagnosis of Alzheimer's disease. , 1985, Archives of neurology.

[23]  C. Jack,et al.  3D maps from multiple MRI illustrate changing atrophy patterns as subjects progress from mild cognitive impairment to Alzheimer's disease. , 2007, Brain : a journal of neurology.

[24]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[25]  Zenglin Xu,et al.  Joint Association Discovery and Diagnosis of Alzheimer's Disease by Supervised Heterogeneous Multiview Learning , 2013, Pacific Symposium on Biocomputing.

[26]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..