Sparse Instrumental Variables (SPIV) for Genome-Wide Studies

This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as "instruments" to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ~ 105 genetic instruments and ~ 47 x 103 gene transcripts.

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  R. Tibshirani,et al.  REJOINDER TO "LEAST ANGLE REGRESSION" BY EFRON ET AL. , 2004, math/0406474.

[3]  S. Ebrahim,et al.  'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? , 2003, International journal of epidemiology.

[4]  Jun Zhu,et al.  Increasing the Power to Detect Causal Associations by Combining Genotypic and Expression Data in Segregating Populations , 2007, PLoS Comput. Biol..

[5]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[6]  Richard Scheines,et al.  Learning the Structure of Linear Latent Variable Models , 2006, J. Mach. Learn. Res..

[7]  Judea Pearl,et al.  Generalized Instrumental Variables , 2002, UAI.

[8]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[9]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[10]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[11]  D. Rubin,et al.  Identification of Causal Effects Using Instrumental Variables: Rejoinder , 1996 .

[12]  Tom Heskes,et al.  Improving posterior marginal approximations in latent Gaussian models , 2010, AISTATS.

[13]  Adam J. Rothman,et al.  Sparse estimation of large covariance matrices via a nested Lasso penalty , 2008, 0803.3872.

[14]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[15]  James J. Heckman,et al.  Identification of Causal Effects Using Instrumental Variables: Comment , 1996 .

[16]  David V Conti,et al.  Commentary: the concept of 'Mendelian Randomization'. , 2004, International journal of epidemiology.

[17]  J. Friedman,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Response , 1993 .

[18]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[19]  A. C. Collins,et al.  A method for fine mapping quantitative trait loci in outbred animal stocks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Martin S. Taylor,et al.  Genome-wide genetic association of complex traits in heterogeneous stock mice , 2006, Nature Genetics.

[21]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[22]  Martin J. Wainwright,et al.  Sharp thresholds for high-dimensional and noisy recovery of sparsity , 2006, ArXiv.

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[25]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[26]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[27]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[28]  William Wheeler,et al.  Genome-wide association study of circulating vitamin D levels , 2010, Human molecular genetics.

[29]  M. Maathuis,et al.  Estimating high-dimensional intervention effects from observational data , 2008, 0810.4214.

[30]  J. Pearl Causal inference in statistics: An overview , 2009 .

[31]  Michelle Chan,et al.  Lean phenotype and resistance to diet-induced obesity in vitamin D receptor knockout mice correlates with induction of uncoupling protein-1 in white adipose tissue. , 2009, Endocrinology.

[32]  M. Katan APOUPOPROTEIN E ISOFORMS, SERUM CHOLESTEROL, AND CANCER , 1986, The Lancet.

[33]  Judea Pearl,et al.  Identification of Conditional Interventional Distributions , 2006, UAI.

[34]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[35]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[36]  Michael P. Murray,et al.  Instrumental Variables , 2011, International Encyclopedia of Statistical Science.

[37]  D. Katz The American Statistical Association , 2000 .

[38]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[39]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[40]  J. Castle,et al.  An integrative genomics approach to infer causal associations between gene expression and disease , 2005, Nature Genetics.

[41]  William Valdar,et al.  High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. , 2009, Genome research.

[42]  M. Yuan,et al.  On the Nonnegative Garrote Estimator , 2005 .

[43]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[44]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[45]  Bin Yu,et al.  On Model Selection Consistency of the Elastic Net When p >> n , 2008 .

[46]  P. Rosenbaum Identification of Causal Effects Using Instrumental Variables: Comment , 2007 .

[47]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[48]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Yuhong Yang,et al.  Information Theory, Inference, and Learning Algorithms , 2005 .