Inference of Causal Relationships between Biomarkers and Outcomes in High Dimensions

We describe a unified computational framework for learning causal dependencies between genotypes, biomarkers, and phenotypic outcomes from large-scale data. In contrast to previous studies, our framework allows for noisy measurements, hidden confounders, missing data, and pleiotropic effects of genotypes on outcomes. The method exploits the use of genotypes as “instrumental variables” to infer causal associations between phenotypic biomarkers and outcomes, without requiring the assumption that genotypic effects are mediated only through the observed biomarkers. The framework builds on sparse linear methods developed in statistics and machine learning and modified here for inferring structures of richer networks with latent variables. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. To demonstrate our method, we examined effects of gene transcript levels in the liver on plasma HDL cholesterol levels in a sample of 260 mice from a heterogeneous stock.

[1]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[2]  William Valdar,et al.  A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice , 2006, Mammalian Genome.

[3]  Judea Pearl,et al.  Generalized Instrumental Variables , 2002, UAI.

[4]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[5]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Amos J. Storkey,et al.  Sparse Instrumental Variables (SPIV) for Genome-Wide Studies , 2010, NIPS.

[8]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[9]  Judea Pearl,et al.  Identification of Conditional Interventional Distributions , 2006, UAI.

[10]  Martin S. Taylor,et al.  Genome-wide genetic association of complex traits in heterogeneous stock mice , 2006, Nature Genetics.

[11]  M. Yuan,et al.  On the Nonnegative Garrote Estimator , 2005 .

[12]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[13]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[14]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[15]  J. Pearl Causal inference in statistics: An overview , 2009 .

[16]  Jun Zhu,et al.  Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations , 2007, PLoS Comput. Biol..

[17]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[18]  William Valdar,et al.  High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. , 2009, Genome research.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Michael P. Murray,et al.  Instrumental Variables , 2011, International Encyclopedia of Statistical Science.

[21]  Robert Hitzemann,et al.  Further Characterization and High-Resolution Mapping of Quantitative Trait Loci for Ethanol-Induced Locomotor Activity , 2001, Behavior genetics.

[22]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[23]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[24]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[25]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[26]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[27]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[28]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[29]  M. B. Katan,et al.  Apolipoprotein E isoforms, serum cholesterol, and cancer , 2004 .

[30]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[31]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[32]  W. Marsden I and J , 2012 .

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[34]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[35]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[36]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[37]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[38]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[39]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[40]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[41]  Michelle Chan,et al.  Lean phenotype and resistance to diet-induced obesity in vitamin D receptor knockout mice correlates with induction of uncoupling protein-1 in white adipose tissue. , 2009, Endocrinology.

[42]  Bin Yu,et al.  On Model Selection Consistency of the Elastic Net When p >> n , 2008 .

[43]  A. C. Collins,et al.  A method for fine mapping quantitative trait loci in outbred animal stocks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  P. Rosenbaum Identification of Causal Effects Using Instrumental Variables: Comment , 2007 .

[46]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .