Endogeneity in High Dimensions.

Most papers on high-dimensional statistics are based on the assumption that none of the regressors are correlated with the regression error, namely, they are exogenous. Yet, endogeneity can arise incidentally from a large pool of regressors in a high-dimensional regression. This causes the inconsistency of the penalized least-squares method and possible false scientific discoveries. A necessary condition for model selection consistency of a general class of penalized regression methods is given, which allows us to prove formally the inconsistency claim. To cope with the incidental endogeneity, we construct a novel penalized focused generalized method of moments (FGMM) criterion function. The FGMM effectively achieves the dimension reduction and applies the instrumental variable methods. We show that it possesses the oracle property even in the presence of endogenous predictors, and that the solution is also near global minimum under the over-identification assumption. Finally, we also show how the semi-parametric efficiency of estimation can be achieved via a two-step approach.

[1]  J. N. Srivastava,et al.  Inference on Treatment Effects in Incomplete Block Designs , 1968 .

[2]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[3]  W. Newey,et al.  Large sample estimation and hypothesis testing , 1986 .

[4]  Rafael C. González,et al.  An Iterative Thresholding Algorithm for Image Segmentation , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  G. Chamberlain Asymptotic efficiency in estimation with conditional moment restrictions , 1987 .

[6]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[7]  W. Newey,et al.  16 Efficient estimation of models with conditional moment restrictions , 1993 .

[8]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Jianqing Fan,et al.  Efficient Estimation of Conditional Variance Functions in Stochastic Regression , 1998 .

[11]  Whitney K. Newey,et al.  LARGE SAMPLE ESTIMATION AND HYPOTHESIS , 1999 .

[12]  Donald W. K. Andrews,et al.  Consistent Moment Selection Procedures for Generalized Method of Moments Estimation , 1999 .

[13]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[14]  Donald W. K. Andrews,et al.  Consistent model and moment selection procedures for GMM estimation with application to dynamic panel data models , 2001 .

[15]  Guido W. Imbens,et al.  Empirical likelihood estimation and consistent tests with conditional moment restrictions , 2003 .

[16]  Xiaohong Chen,et al.  Efficient Estimation of Models with Conditional Moment Restrictions Containing Unknown Functions , 2003 .

[17]  P. Hall,et al.  Nonparametric methods for inference in the presence of instrumental variables , 2003, math/0603130.

[18]  Ignacio N. Lobato,et al.  Consistent Estimation of Models Defined by Conditional Moment Restrictions , 2004 .

[19]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[20]  H. Leeb,et al.  Sparse Estimators and the Oracle Property, or the Return of Hodges' Estimator , 2007, 0704.1466.

[21]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[22]  Karim Lounici Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators , 2008, 0801.4610.

[23]  Demian Pouzo,et al.  Estimation of Nonparametric Conditional Moment Models with Possibly Nonsmooth Moments , 2008 .

[24]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[25]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[26]  Yongdai Kim,et al.  Smoothly Clipped Absolute Deviation on High Dimensions , 2008 .

[27]  L. Wasserman,et al.  HIGH DIMENSIONAL VARIABLE SELECTION. , 2007, Annals of statistics.

[28]  A. Belloni,et al.  Least Squares After Model Selection in High-Dimensional Sparse Models , 2009, 1001.0188.

[29]  A. Belloni,et al.  L1-Penalized Quantile Regression in High Dimensional Sparse Models , 2009, 0904.2931.

[30]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[31]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[32]  Mehmet Caner,et al.  LASSO-TYPE GMM ESTIMATOR , 2009, Econometric Theory.

[33]  Cees G. M. Snoek,et al.  Variable Selection , 2019, Model-Based Clustering and Classification for Data Science.

[34]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[35]  A. Belloni,et al.  SPARSE MODELS AND METHODS FOR OPTIMAL INSTRUMENTS WITH AN APPLICATION TO EMINENT DOMAIN , 2012 .

[36]  M. Maathuis,et al.  Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm , 2009, 0906.3204.

[37]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[38]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[39]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[40]  Jianqing Fan,et al.  Nonconcave Penalized Likelihood With NP-Dimensionality , 2009, IEEE Transactions on Information Theory.

[41]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[42]  Jianqing Fan,et al.  Penalized composite quasi‐likelihood for ultrahigh dimensional variable selection , 2009, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[43]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[44]  Jian Huang,et al.  COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION. , 2011, The annals of applied statistics.

[45]  Information Criteria for Selecting Instrumental Variables in Conditional Moment Restriction Models , 2011 .

[46]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[47]  Mehmet Caner,et al.  Hybrid generalized empirical likelihood estimators: Instrument selection with adaptive lasso , 2015 .