Testing endogeneity with high dimensional covariates

Modern, high dimensional data has renewed investigation on instrumental variables (IV) analysis, primarily focusing on estimation of effects of endogenous variables and putting little attention towards specification tests. This paper studies in high dimensions the Durbin-Wu-Hausman (DWH) test, a popular specification test for endogeneity in IV regression. We show, surprisingly, that the DWH test maintains its size in high dimensions, but at an expense of power. We propose a new test that remedies this issue and has better power than the DWH test. Simulation studies reveal that our test achieves near-oracle performance to detect endogeneity.

[1]  A. Belloni,et al.  Inference for High-Dimensional Sparse Econometric Models , 2011, 1201.0220.

[2]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1994 .

[3]  Dylan S. Small,et al.  Sensitivity Analysis for Instrumental Variables Regression With Overidentifying Restrictions , 2007 .

[4]  F. Fisher Approximate Specification and the Choice of a k-Class Estimator , 1967 .

[5]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[6]  James Durbin,et al.  Errors in variables , 1954 .

[7]  Marcelo J. Moreira A Conditional Likelihood Ratio Test for Structural Models , 2003 .

[8]  Dylan S. Small,et al.  Confidence intervals for causal effects with invalid instruments by using two‐stage hard thresholding with voting , 2016, 1603.05224.

[9]  A. Belloni,et al.  Program evaluation with high-dimensional data , 2013 .

[10]  D. Katz The American Statistical Association , 2000 .

[11]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[12]  F. D. Tchatoka On bootstrap validity for specification tests with weak instruments , 2015 .

[13]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[14]  Frank Kleibergen,et al.  Pivotal statistics for testing structural parameters in instrumental variables regression , 2002 .

[15]  Richard Startz,et al.  Some Further Results on the Exact Small Sample Properties of the Instrumental Variable Estimator , 1988 .

[16]  M. Baiocchi,et al.  Instrumental variable methods for causal inference , 2014, Statistics in medicine.

[17]  Mehmet Caner Near Exogeneity and Weak Identification in Generalized Empirical Likelihood Estimators: Fixed and Many Moment Asymptotics , 2006 .

[18]  P. Holland CAUSAL INFERENCE, PATH ANALYSIS AND RECURSIVE STRUCTURAL EQUATIONS MODELS , 1988 .

[19]  W. Newey,et al.  Generalized method of moments specification testing , 1985 .

[20]  Harrison H. Zhou,et al.  Quantile coupling inequalities and their applications , 2012 .

[21]  Michael P. Murray Avoiding Invalid Instruments and Coping with Weak Instruments , 2006 .

[22]  F. Fisher The Relative Sensitivity to Specification Error of Different k-Class Estimators , 1966 .

[23]  Norman R. Swanson,et al.  Testing Overidentifying Restrictions with Many Instruments and Heteroskedasticity , 2010 .

[24]  De-Min Wu,et al.  Alternative Tests of Independence between Stochastic Regressors and Disturbances , 1973 .

[25]  Chirok Han,et al.  Detecting Invalid Instruments Using L1-GMM , 2007 .

[26]  J. Robins,et al.  Instruments for Causal Inference: An Epidemiologist's Dream? , 2006, Epidemiology.

[27]  Jonathan H. Wright,et al.  GMM WITH WEAK IDENTIFICATION , 2000 .

[28]  Roberto S. Mariano,et al.  Approximations to the Distribution Functions of Theil's K-Class Estimators , 1973 .

[29]  Yoonseok Lee,et al.  Hahn-Hausman test as a specification test , 2012 .

[30]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[31]  Patrik Guggenberger ON THE ASYMPTOTIC SIZE DISTORTION OF TESTS WHEN INSTRUMENTS LOCALLY VIOLATE THE EXOGENEITY ASSUMPTION , 2011, Econometric Theory.

[32]  A. Hall,et al.  A Consistent Method for the Selection of Relevant Instruments , 2003 .

[33]  T. Tony Cai,et al.  Confidence intervals for high-dimensional linear regression: Minimax rates and adaptivity , 2015, 1506.05539.

[34]  Peter E. Rossi,et al.  Plausibly Exogenous , 2012, Review of Economics and Statistics.

[35]  A. Belloni,et al.  Square-Root Lasso: Pivotal Recovery of Sparse Signals via Conic Programming , 2011 .

[36]  Donald W. K. Andrews,et al.  Performance of Conditional Wald Tests in IV Regression with Weak Instruments , 2007 .

[37]  Jianqing Fan,et al.  Endogeneity in High Dimensions. , 2012, Annals of statistics.

[38]  Christian Hansen,et al.  Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments , 2015, 1501.03185.

[39]  Donald W. K. Andrews,et al.  Consistent Moment Selection Procedures for Generalized Method of Moments Estimation , 1999 .

[40]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[41]  David Card The Causal Effect of Education on Learning , 1999 .

[42]  C. Meyerhoefer,et al.  The Impact of Physical Education on Obesity Among Elementary School Children , 2012, Journal of health economics.

[43]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[44]  John C. Ham,et al.  The Hausman Test and Weak Instruments , 2011 .

[45]  Eric Zivot,et al.  Inference on a Structural Parameter in Instrumental Variables Regression with Weak Instruments , 1996 .

[46]  J. Hahn,et al.  Estimation with Valid and Invalid Instruments , 2005 .

[47]  Lee H. Dicker,et al.  Variance estimation in high-dimensional linear models , 2014 .

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[49]  Christian Hansen,et al.  Estimation with many instrumental variables , 2006 .

[50]  Christian Hansen,et al.  Valid Post-Selection and Post-Regularization Inference: An Elementary, General Approach , 2015 .

[51]  J. MacKinnon,et al.  Estimation and inference in econometrics , 1994 .

[52]  Jean-Marie Dufour,et al.  Some Impossibility Theorems in Econometrics with Applications to Structural and Dynamic Models , 1997 .

[53]  Motohiro Yogo,et al.  Asymptotic Properties of the Hahn-Hausman Test for Weak Instruments , 2004 .

[54]  Peter Schmidt,et al.  Redundancy of moment conditions , 1999 .

[55]  Paul A. Bekker,et al.  ALTERNATIVE APPROXIMATIONS TO THE DISTRIBUTIONS OF INSTRUMENTAL VARIABLE ESTIMATORS , 1994 .

[56]  J. Hausman Specification tests in econometrics , 1978 .

[57]  Christian Hansen,et al.  High-Dimensional Methods and Inference on Structural and Treatment Effects , 2013 .

[58]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[59]  Zhipeng Liao,et al.  ADAPTIVE GMM SHRINKAGE ESTIMATION WITH CONSISTENT MOMENT SELECTION , 2012, Econometric Theory.

[60]  Dylan S. Small,et al.  Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization , 2014, 1401.5755.

[61]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[62]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[63]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[64]  Daniel Berkowitz,et al.  The Validity of Instruments Revisited , 2008 .

[65]  J. Sargan THE ESTIMATION OF ECONOMIC RELATIONSHIPS USING INSTRUMENTAL VARIABLES , 1958 .

[66]  Harrison H. Zhou,et al.  Asymptotic normality and optimalities in estimation of large Gaussian graphical models , 2013, 1309.6024.

[67]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[68]  Norman R. Swanson,et al.  Consistent Estimation with a Large Number of Weak Instruments , 2005 .

[69]  K. Morimune Approximate Distributions of k-Class Estimators when the Degree of Overidentifiability is Large Compared with the Sample Size , 1983 .

[70]  Achim Zeileis,et al.  Applied Econometrics with R , 2008 .

[71]  Eric Zivot,et al.  Valid Confidence Intervals and Inference in the Presence of Weak Instruments , 1998 .

[72]  R. C. Campbell,et al.  The Hausman test, and some alternatives, with heteroskedastic data , 2012 .

[73]  J. Angrist,et al.  Identification and Estimation of Local Average Treatment Effects , 1995 .

[74]  Raj Chetty,et al.  Identification and Inference With Many Invalid Instruments , 2011 .

[75]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[76]  Patrik Guggenberger THE IMPACT OF A HAUSMAN PRETEST ON THE ASYMPTOTIC SIZE OF A HYPOTHESIS TEST , 2009, Econometric Theory.

[77]  J. Stock,et al.  Instrumental Variables Regression with Weak Instruments , 1994 .

[78]  P. Holland Causal Inference, Path Analysis and Recursive Structural Equations Models. Program Statistics Research, Technical Report No. 88-81. , 1988 .

[79]  Donald W. K. Andrews,et al.  Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models , 2001 .

[80]  K. Kosec The child health implications of privatizing Africa's urban water supply. , 2014, Journal of health economics.

[81]  Jinyong Hahn,et al.  A New Specification Test for the Validity of Instrumental Variables , 2000 .

[82]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[83]  David A. Jaeger,et al.  Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak , 1995 .

[84]  Zhipeng Liao,et al.  Select the Valid and Relevant Moments: An Information-Based LASSO for GMM with Many Moments , 2013 .

[85]  Chirok Han,et al.  GMM with Many Moment Conditions , 2005 .

[86]  Jeffrey M. Wooldridge,et al.  Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data , 2003 .

[87]  Masao Nakamura,et al.  On the Relationships among Several Specification Error Tests Presented by Durbin, Wu, and Hausman , 1981 .

[88]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.