The pitfall of instrumental variables in big data: What the rule of thumb can't give you

ABSTRACT Background: Instrumental variables (IVs) have become much easier to find in the “Big data era” which has increased the number of applications of the Two-Stage Least Squares model (TSLS). With the increased availability of IVs, the possibility that these IVs are weak has increased. Prior work has suggested a ‘rule of thumb’ that IVs with a first stage F statistic at least ten will avoid a relative bias in point estimates greater than 10%. We investigated whether or not this threshold was also an efficient guarantee of low false rejection rates of the null hypothesis test in TSLS applications with many IVs. Objective: To test how the ‘rule of thumb’ for weak instruments performs in predicting low false rejection rates in the TSLS model when the number of IVs is large. Method: We used a Monte Carlo approach to create 28 original data sets for different models with the number of IVs varying from 3 to 30. For each model, we generated 2000 observations for each iteration and conducted 50,000 iterations to reach convergence in rejection rates. The point estimate was set to 0, and probabilities of rejecting this hypothesis were recorded for each model as a measurement of false rejection rate. The relationship between the endogenous variable and IVs was carefully adjusted to let the F statistics for the first stage model equal ten, thus simulating the ‘rule of thumb.’ Results: We found that the false rejection rates (type I errors) increased when the number of IVs in the TSLS model increased while holding the F statistics for the first stage model equal to 10. The false rejection rate exceeds 10% when TLSL has 24 IVs and exceed 15% when TLSL has 30 IVs. Conclusion: When more instrumental variables were applied in the model, the ‘rule of thumb’ was no longer an efficient guarantee for good performance in hypothesis testing. A more restricted margin for F statistics is recommended to replace the ‘rule of thumb,’ especially when the number of instrumental variables is large.

[1]  E. Norton,et al.  Medical expulsive therapy versus early endoscopic stone removal for acute renal colic: an instrumental variable analysis. , 2013, The Journal of urology.

[2]  Keith W. Miller,et al.  Big Data: New Opportunities and New Challenges [Guest editors' introduction] , 2013, Computer.

[3]  Christian Hansen,et al.  Instrumental variables estimation with many weak instruments using regularized JIVE , 2014 .

[4]  J. Stock,et al.  Instrumental Variables Regression with Weak Instruments , 1994 .

[5]  S. Blomquist,et al.  Small Sample Properties of LIML and Jackknife IV Estimators: Experiments with Weak Instruments , 1999 .

[6]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[7]  Michael P. Murray Avoiding Invalid Instruments and Coping with Weak Instruments , 2006 .

[8]  S. Koch Achieving Holistic Health for the Individual through Person-Centered Collaborative Care Supported by Informatics , 2013, Healthcare informatics research.

[9]  G. Dunn,et al.  Psychological treatments for early psychosis can be beneficial or harmful, depending on the therapeutic alliance: an instrumental variable analysis , 2015, Psychological Medicine.

[10]  Leandro M. Magnusson,et al.  Implementing Weak-Instrument Robust Tests for a General Class of Instrumental-Variables Models , 2009 .

[11]  Jonathan H. Wright,et al.  A Survey of Weak Instruments and Weak Identification in Generalized Method of Moments , 2002 .

[12]  Dylan S. Small,et al.  Sensitivity Analysis for Instrumental Variables Regression With Overidentifying Restrictions , 2007 .

[13]  Paul R. Rosenbaum,et al.  Robust, accurate confidence intervals with a weak instrument: quarter of birth and education , 2005 .

[14]  Alfonso Flores-Lagunes,et al.  Finite sample evidence of IV estimators under weak instruments , 2007 .

[15]  Wiebe R. Pestman,et al.  Instrumental Variables: Application and Limitations , 2006, Epidemiology.

[16]  J. Angrist,et al.  Does Compulsory School Attendance Affect Schooling and Earnings? , 1990 .

[17]  J. Stock,et al.  Retrospectives Who Invented Instrumental Variable Regression , 2003 .

[18]  Tae-Min Song,et al.  Big Data Analysis Framework for Healthcare and Social Sectors in Korea , 2015, Healthcare informatics research.

[19]  Amanda E. Kowalski Censored Quantile Instrumental Variable Estimates of the Price Elasticity of Expenditure on Medical Care , 2009, Journal of business & economic statistics : a publication of the American Statistical Association.