Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network

Many complex disease syndromes, such as asthma, consist of a large number of highly related, rather than independent, clinical or molecular phenotypes. This raises a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to directly and effectively incorporate the correlation structure of multiple quantitative traits such as clinical metrics and gene expressions in association analysis. Our approach represents correlation information explicitly among the quantitative traits as a quantitative trait network (QTN) and then leverages this network to encode structured regularization functions in a multivariate regression model over the genotypes and traits. The result is that the genetic markers that jointly influence subgroups of highly correlated traits can be detected jointly with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical framework. This allows our method to borrow information across correlated phenotypes to discover the genetic markers that perturb a subset of the correlated traits synergistically. Using simulated datasets based on the HapMap consortium and an asthma dataset, we compared the performance of our method with other methods based on single-marker analysis and regression-based methods that do not use any of the relational information in the traits. We found that our method showed an increased power in detecting causal variants affecting correlated traits. Our results showed that, when correlation patterns among traits in a QTN are considered explicitly and directly during a structured multivariate genome association analysis using our proposed methods, the power of detecting true causal SNPs with possibly pleiotropic effects increased significantly without compromising performance on non-pleiotropic SNPs.

[1]  A. E. Hoerl,et al.  Ridge regression:some simulations , 1975 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  B. Mangin,et al.  PLEIOTROPIC QTL ANALYSIS , 1998 .

[5]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .

[7]  Hiroyuki Toh,et al.  Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling , 2002, Bioinform..

[8]  M. Stephens,et al.  Modelling Linkage Disequilibrium , And Identifying Recombination Hotspots Using SNP Data , 2003 .

[9]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[10]  M. Stephens,et al.  Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. , 2003, Genetics.

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Paul M. Magwene,et al.  Estimating genomic coexpression networks using first-order conditional independence , 2004, Genome Biology.

[13]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[14]  C. Molony,et al.  Genetic analysis of genome-wide variation in human gene expression , 2004, Nature.

[15]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[16]  P. VanRaden,et al.  Application of a canonical transformation to detection of quantitative trait loci with the aid of genetic markers in a multi-trait experiment , 1996, Theoretical and Applied Genetics.

[17]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[18]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[19]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[20]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[21]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[22]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[23]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[24]  Joshua T. Burdick,et al.  Mapping determinants of human gene expression by regional and genome-wide association , 2005, Nature.

[25]  S. Hunt,et al.  Genome-Wide Associations of Gene Expression Variation in Humans , 2005, PLoS genetics.

[26]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[27]  A. Butte,et al.  Creation and implications of a phenome-genome network , 2006, Nature Biotechnology.

[28]  D. Pe’er,et al.  Identifying regulatory mechanisms using individual variation reveals key role for chromatin modification , 2006, Proceedings of the National Academy of Sciences.

[29]  Philip S Rosenberg,et al.  Multiple hypothesis testing strategies for genetic case–control association studies , 2006, Statistics in medicine.

[30]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[31]  D. Curran‐Everett,et al.  IL4Rα Mutations Are Associated with Asthma Exacerbations and Mast Cell/IgE Expression , 2007 .

[32]  Grace Wahba,et al.  Detecting disease-causing genes by LASSO-Patternsearch algorithm , 2007, BMC proceedings.

[33]  D. Curran‐Everett,et al.  Characterization of the severe asthma phenotype by the National Heart, Lung, and Blood Institute's Severe Asthma Research Program. , 2007, The Journal of allergy and clinical immunology.

[34]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[35]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[36]  Eleazar Eskin,et al.  High-Resolution Mapping of Gene Expression Using Association in an Outbred Mouse Stock , 2008, PLoS genetics.

[37]  Eric E Schadt,et al.  Cycle Regulation in Islets with Diabetes Susceptibility a Gene Expression Network Model of Type 2 Diabetes Links Cell P

, 2008 .

[38]  Francis R. Bach,et al.  Consistency of the group Lasso and multiple kernel learning , 2007, J. Mach. Learn. Res..

[39]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[40]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[41]  N. Schork,et al.  Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. , 2008, American journal of human genetics.

[42]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[43]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[44]  Mrinal Kalakrishnan,et al.  An Integrative Network Approach to Map the Transcriptome to the Phenome , 2008, RECOMB.

[45]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[46]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.