Ivy: Instrumental Variable Synthesis for Causal Inference

A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily strong, or even valid, IVs---into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require strong assumptions---independence and validity of all IV candidates---for the resulting estimate to be reliable. To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner. Theoretically, we characterize this robustness, its limits, and its impact on the resulting causal estimates. Empirically, Ivy can correctly identify the directionality of known relationships and is robust against false discovery (median effect size = 0.118).

[1]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[2]  David Heckerman,et al.  A Bayesian Approach to Learning Causal Networks , 1995, UAI.

[3]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[4]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[5]  J. Angrist,et al.  Does Compulsory School Attendance Affect Schooling and Earnings? , 1990 .

[6]  Nicole Fassbinder,et al.  Mostly Harmless Econometrics An Empiricists Companion , 2016 .

[7]  Stephen Burgess,et al.  Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods , 2015, Statistics in medicine.

[8]  Caroline Uhler,et al.  Characterizing and Learning Equivalence Classes of Causal DAGs under Interventions , 2018, ICML.

[9]  Dylan S. Small,et al.  A review of instrumental variable estimators for Mendelian randomization , 2015, Statistical methods in medical research.

[10]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[12]  Judea Pearl,et al.  On the Testability of Causal Models With Latent and Instrumental Variables , 1995, UAI.

[13]  Joshua D. Angrist,et al.  Identification of Causal Effects Using Instrumental Variables , 1993 .

[14]  David A. Jaeger,et al.  Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak , 1995 .

[15]  J. Angrist,et al.  Jackknife Instrumental Variables Estimation , 1995 .

[16]  Yixin Wang,et al.  Multiple Causes: A Causal Graphical View , 2019, ArXiv.

[17]  Chirok Han,et al.  Detecting Invalid Instruments Using L1-GMM , 2007 .

[18]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[19]  J. Pearl,et al.  Bounds on Treatment Effects from Studies with Imperfect Compliance , 1997 .

[20]  J. Brent Richards,et al.  Mendelian Randomization Studies Do Not Support a Role for Vitamin D in Coronary Artery Disease , 2016, Circulation. Cardiovascular genetics.

[21]  Frederic Sala,et al.  Learning Dependency Structures for Weak Supervision Models , 2019, ICML.

[22]  Blai Bonet,et al.  Instrumentality Tests Revisited , 2001, UAI.

[23]  K. Rye,et al.  HDL function as a predictor of coronary heart disease events: time to re-assess the HDL hypothesis? , 2015, The lancet. Diabetes & endocrinology.

[24]  Trevor Hastie,et al.  Learning the Structure of Mixed Graphical Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[25]  Frederic Sala,et al.  Training Complex Models with Multi-Task Weak Supervision , 2018, AAAI.

[26]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[27]  John Spertus,et al.  Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study , 2012, The Lancet.

[28]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[29]  G. Davey Smith,et al.  Consistent Estimation in Mendelian Randomization with Some Invalid Instruments Using a Weighted Median Estimator , 2016, Genetic epidemiology.

[30]  B. Cowling,et al.  Power and sample size calculations for Mendelian randomization studies using one genetic instrument. , 2013, International journal of epidemiology.

[31]  Tom R. Gaunt,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[32]  Christopher De Sa,et al.  Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.

[33]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[34]  D. Rader,et al.  HDL and cardiovascular disease , 2014, The Lancet.

[35]  Xingyu Zhou On the Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient , 2018, 1803.06573.

[36]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[37]  Christian Gieger,et al.  Genetic Variants in Novel Pathways Influence Blood Pressure and Cardiovascular Disease Risk , 2011, Nature.

[38]  Kun Zhang,et al.  Multi-domain Causal Structure Learning in Linear Systems , 2018, NeurIPS.

[39]  Kevin Leyton-Brown,et al.  Deep IV: A Flexible Approach for Counterfactual Prediction , 2017, ICML.

[40]  Neil M Davies,et al.  The many weak instruments problem and Mendelian randomization , 2014, Statistics in medicine.

[41]  Markus Abt,et al.  Effects of dalcetrapib in patients with a recent acute coronary syndrome. , 2012, The New England journal of medicine.

[42]  Michael J Pencina,et al.  Genetic Predisposition to Higher Blood Pressure Increases Coronary Artery Disease Risk , 2013, Hypertension.

[43]  Amit Sharma,et al.  Necessary and Probably Sufficient Test for Finding Valid Instrumental Variables , 2018, 1812.01412.

[44]  Philip G. Wright,et al.  The tariff on animal and vegetable oils , 1928 .

[45]  Po-Ling Loh,et al.  Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , 2012, NIPS.

[46]  Neil M Davies,et al.  Mendelian randomization: a novel approach for the prediction of adverse drug events and drug repurposing opportunities , 2017, bioRxiv.

[47]  Alex P. Reiner,et al.  Mendelian randomization of blood lipids for coronary heart disease , 2014, European heart journal.

[48]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[49]  S. Thompson,et al.  Mendelian Randomization , 2015 .

[50]  Dylan S. Small,et al.  Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization , 2014, 1401.5755.

[51]  Alexandros G. Dimakis,et al.  Cost-Optimal Learning of Causal Graphs , 2017, ICML.

[52]  Helmut Farbmacher,et al.  On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments , 2018, Journal of the American Statistical Association.

[53]  James M. Robins,et al.  Partial Identification of the Average Treatment Effect Using Instrumental Variables: Review of Methods for Binary Instruments, Treatments, and Outcomes , 2018, Journal of the American Statistical Association.

[54]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[55]  Constantinos Daskalakis,et al.  Learning and Testing Causal Models with Interventions , 2018, NeurIPS.

[56]  Andrew Bennett,et al.  Deep Generalized Method of Moments for Instrumental Variable Analysis , 2019, NeurIPS.

[57]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[58]  Jean Honorio Lipschitz Parametrization of Probabilistic Graphical Models , 2011, UAI.

[59]  Joshua D. Angrist,et al.  Mostly Harmless Econometrics: An Empiricist's Companion , 2008 .

[60]  David M. Blei,et al.  The Blessings of Multiple Causes , 2018, Journal of the American Statistical Association.

[61]  Hongyu Zhao,et al.  Graphical model selection with latent variables , 2017 .

[62]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[63]  Michael V Holmes,et al.  Conventional and genetic evidence on alcohol and vascular disease aetiology: a prospective study of 500 000 men and women in China , 2019, The Lancet.

[64]  G. Imbens,et al.  Matrix Completion Methods for Causal Panel Data Models , 2017, Journal of the American Statistical Association.

[65]  A. Wald The Fitting of Straight Lines if Both Variables are Subject to Error , 1940 .

[66]  Paola Sebastiani,et al.  Naïve Bayesian Classifier and Genetic Risk Score for Genetic Risk Prediction of a Categorical Trait: Not so Different after all! , 2012, Front. Gene..

[67]  S. Thompson,et al.  Use of allele scores as instrumental variables for Mendelian randomization , 2013, International journal of epidemiology.

[68]  Alexander D'Amour,et al.  On Multi-Cause Approaches to Causal Inference with Unobserved Counfounding: Two Cautionary Failure Cases and A Promising Alternative , 2019, AISTATS.

[69]  Vincenzo Forgetta,et al.  Mendelian randomisation applied to drug development in cardiovascular disease: a review , 2014, Journal of Medical Genetics.

[70]  S. Burgess,et al.  Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates , 2018, European Journal of Epidemiology.