On the Use of the Lasso for Instrumental Variables Estimation with Some Invalid Instruments

ABSTRACT We investigate the behavior of the Lasso for selecting invalid instruments in linear instrumental variables models for estimating causal effects of exposures on outcomes, as proposed recently by Kang et al. Invalid instruments are such that they fail the exclusion restriction and enter the model as explanatory variables. We show that for this setup, the Lasso may not consistently select the invalid instruments if these are relatively strong. We propose a median estimator that is consistent when less than 50% of the instruments are invalid, and its consistency does not depend on the relative strength of the instruments, or their correlation structure. We show that this estimator can be used for adaptive Lasso estimation, with the resulting estimator having oracle properties. The methods are applied to a Mendelian randomization study to estimate the causal effect of body mass index (BMI) on diastolic blood pressure, using data on individuals from the UK Biobank, with 96 single nucleotide polymorphisms as potential instruments for BMI. Supplementary materials for this article are available online.

[1]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[2]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[3]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[4]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[5]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[6]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[7]  Lars Peter Hansen,et al.  LARGE SAMPLE PROPERTIES OF GENERALIZED METHOD OF , 1982 .

[8]  Stephen L. Morgan,et al.  Instrumental Variables Regression , 2014 .

[9]  Paul A. Bekker,et al.  ALTERNATIVE APPROXIMATIONS TO THE DISTRIBUTIONS OF INSTRUMENTAL VARIABLE ESTIMATORS , 1994 .

[10]  Zhipeng Liao,et al.  Select the Valid and Relevant Moments: An Information-Based LASSO for GMM with Many Moments , 2013 .

[11]  Frank Windmeijer,et al.  Instrumental Variable Estimators for Binary Outcomes , 2009 .

[12]  D. Lawlor,et al.  Genetic markers as instrumental variables , 2011, Journal of health economics.

[13]  Dylan S. Small,et al.  Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization , 2014, 1401.5755.

[14]  George Davey Smith,et al.  Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology , 2008, Statistics in medicine.

[15]  G. Imbens Instrumental Variables: An Econometrician's Perspective , 2014, SSRN Electronic Journal.

[16]  Zhipeng Liao,et al.  ADAPTIVE GMM SHRINKAGE ESTIMATION WITH CONSISTENT MOMENT SELECTION , 2012, Econometric Theory.

[17]  G. Davey Smith,et al.  Mendelian randomization: genetic anchors for causal inference in epidemiological studies , 2014, Human molecular genetics.

[18]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[19]  Bing-Yi Jing,et al.  Self-normalized Cramér-type large deviations for independent random variables , 2003 .

[20]  Sander Greenland,et al.  An introduction to instrumental variables for epidemiologists. , 2018, International journal of epidemiology.

[21]  Whitney K. Newey,et al.  Generalized method of moments with many weak moment conditions , 2009 .

[22]  Christian Hansen,et al.  Estimation With Many Instrumental Variables , 2006, Journal of Business & Economic Statistics.

[23]  R. Collins What makes UK Biobank special? , 2012, The Lancet.

[24]  J. Angrist,et al.  Does Compulsory School Attendance Affect Schooling and Earnings? , 1990 .

[25]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[26]  Donald W. K. Andrews,et al.  Consistent Moment Selection Procedures for Generalized Method of Moments Estimation , 1999 .

[27]  Dylan S. Small,et al.  A review of instrumental variable estimators for Mendelian randomization , 2015, Statistical methods in medical research.

[28]  Chirok Han,et al.  Detecting Invalid Instruments Using L1-GMM , 2007 .

[29]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[30]  Thomas J. Rothenberg,et al.  Approximating the distributions of econometric estimators and test statistics , 1984 .

[31]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[32]  Dylan S. Small,et al.  Confidence intervals for causal effects with invalid instruments by using two‐stage hard thresholding with voting , 2016, 1603.05224.

[33]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[34]  A. Belloni,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011, 1201.0224.

[35]  V. Spokoiny,et al.  Instrumental Variables Regression , 2018, Foundations of Modern Econometrics.

[36]  Victor Chernozhukov,et al.  Inference on Treatment Effects after Selection Amongst High-Dimensional Controls , 2011 .

[37]  J. Sargan THE ESTIMATION OF ECONOMIC RELATIONSHIPS USING INSTRUMENTAL VARIABLES , 1958 .

[38]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[39]  Neil M Davies,et al.  The many weak instruments problem and Mendelian randomization , 2014, Statistics in medicine.

[40]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[41]  Ross M. Fraser,et al.  Genetic studies of body mass index yield new insights for obesity biology , 2015, Nature.

[42]  J. Stock,et al.  Instrumental Variables Regression with Weak Instruments , 1994 .

[43]  Raj Chetty,et al.  Identification and Inference With Many Invalid Instruments , 2011 .

[44]  Christian Hansen,et al.  Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments , 2015, 1501.03185.

[45]  G. Davey Smith,et al.  Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression , 2015, International journal of epidemiology.

[46]  Hongzhe Li,et al.  Regularization Methods for High-Dimensional Instrumental Variables Regression With an Application to Genetical Genomics , 2013, Journal of the American Statistical Association.