Association Discovery and Diagnosis of Alzheimer's Disease with Bayesian Multiview Learning

The analysis and diagnosis of Alzheimer's disease (AD) can be based on genetic variations, e.g., single nucleotide polymorphisms (SNPs) and phenotypic traits, e.g., Magnetic Resonance Imaging (MRI) features. We consider two important and related tasks: i) to select genetic and phenotypical markers for AD diagnosis and ii) to identify associations between genetic and phenotypical data. While previous studies treat these two tasks separately, they are tightly coupled because underlying associations between genetic variations and phenotypical features contain the biological basis for a disease. Here we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels; in return, the disease status can guide the discovery of relationships between data sources. The sparse projection matrices not only reveal interactions between data sources but also select groups of biomarkers related to the disease. Moreover, to take advantage of the linkage disequilibrium (LD) measuring the non-random association of alleles, we incorporate a graph Laplacian type of prior in the model. To learn the model from data, we develop an efficient variational inference algorithm. Analysis on an imaging genetics dataset for the study of Alzheimer's Disease (AD) indicates that our model identifies biologically meaningful associations between genetic variations and MRI features, and achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.

[1]  J. Shawe-Taylor,et al.  Multi-View Canonical Correlation Analysis , 2010 .

[2]  V. Calhoun,et al.  Combining fMRI and SNP data to investigate connections between brain function and genetics using parallel ICA , 2009, Human brain mapping.

[3]  Xi Chen,et al.  Structured Sparse Canonical Correlation Analysis , 2012, AISTATS.

[4]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[5]  Zenglin Xu,et al.  Sparse Bayesian Multiview Learning for Simultaneous Association Discovery and Diagnosis of Alzheimer's Disease , 2015, AAAI.

[6]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[7]  David Tritchler,et al.  Genome-wide sparse canonical correlation of gene expression with genotypes , 2007, BMC proceedings.

[8]  H. Grüneberg,et al.  Introduction to quantitative genetics , 1960 .

[9]  Chris Frost,et al.  Differential regional atrophy of the cingulate gyrus in Alzheimer disease: a volumetric MRI study. , 2005, Cerebral cortex.

[10]  Yoshua Bengio,et al.  Large-Scale Feature Learning With Spike-and-Slab Sparse Coding , 2012, ICML.

[11]  C. Filley Diagnosis of Alzheimer's disease. , 1988, Colorado medicine.

[12]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[13]  David J Hunter,et al.  Lessons from genome-wide association studies for epidemiology. , 2012, Epidemiology.

[14]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[15]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[16]  Mikhail Belkin,et al.  Laplacian Support Vector Machines Trained in the Primal , 2009, J. Mach. Learn. Res..

[17]  C. Damerval,et al.  QTL analysis of proteome and transcriptome variations for dissecting the genetic architecture of complex traits in maize , 2002, Plant Molecular Biology.

[18]  Katherine A. Heller,et al.  Bayesian and L1 Approaches to Sparse Unsupervised Learning , 2011, ICML 2012.

[19]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[20]  Nicholas W Wood,et al.  Genome-wide association studies: the key to unlocking neurodegeneration? , 2010, Nature Neuroscience.

[21]  M. West,et al.  High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics , 2008, Journal of the American Statistical Association.

[22]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[23]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  Zenglin Xu,et al.  Supervised Heterogeneous Multiview Learning for Joint Association Study and Disease Diagnosis , 2013, ArXiv.

[26]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[27]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[28]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Wei Chu,et al.  Gaussian Processes for Ordinal Regression , 2005, J. Mach. Learn. Res..

[31]  Geoffrey J. Gordon,et al.  Closed-form supervised dimensionality reduction with generalized linear models , 2008, ICML '08.

[32]  Miguel Lázaro-Gredilla,et al.  Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning , 2011, NIPS.

[33]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[34]  D. Falconer Introduction to quantitative genetics. 1. ed. , 1984 .

[35]  Francis R. Bach,et al.  Sparse probabilistic projections , 2008, NIPS.

[36]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[37]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[38]  Jennifer G. Dy,et al.  Sparse Probabilistic Principal Component Analysis , 2009, AISTATS.