Balanced estimation for high-dimensional measurement error models

Noisy and missing data are often encountered in real applications such that the observed covariates contain measurement errors. Despite the rapid progress of model selection with contaminated covariates in high dimensions, methodology that enjoys virtues in all aspects of prediction, variable selection, and computation remains largely unexplored. In this paper, we propose a new method called as the balanced estimation for high-dimensional error-in-variables regression to achieve an ideal balance between prediction and variable selection under both additive and multiplicative measurement errors. It combines the strengths of the nearest positive semi-definite projection and the combined L1 and concave regularization, and thus can be efficiently solved through the coordinate optimization algorithm. We also provide theoretical guarantees for the proposed methodology by establishing the oracle prediction and estimation error bounds equivalent to those for Lasso with the clean data set, as well as an explicit and asymptotically vanishing bound on the false sign rate that controls overfitting, a serious problem under measurement errors. Our numerical studies show that the amelioration of variable selection will in turn improve the prediction and estimation performance under measurement errors.

[1]  Runze Li,et al.  Variable Selection in Measurement Error Models. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[2]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[3]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[4]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[5]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[6]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[7]  Gaorong Li,et al.  Sequential profile Lasso for ultra-high-dimensional partially linear models , 2017 .

[8]  Yufeng Liu,et al.  Variable Selection via A Combination of the L0 and L1 Penalties , 2007 .

[9]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[12]  Runze Li,et al.  Variable Selection for Partially Linear Models With Measurement Errors , 2009, Journal of the American Statistical Association.

[13]  Hui Zou,et al.  CoCoLasso for High-dimensional Error-in-variables Regression , 2015, 1510.07123.

[14]  J. Eltinge Measurement error models for time series , 1987 .

[15]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[16]  Jinchi Lv,et al.  High dimensional thresholded regression and shrinkage effect , 2014, 1605.03306.

[17]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[18]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[19]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[20]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[21]  Jinchi Lv,et al.  Asymptotic properties for combined L1 and concave regularization , 2014, 1605.03335.

[22]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[23]  Jinchi Lv,et al.  The Constrained Dantzig Selector with Enhanced Consistency , 2016, J. Mach. Learn. Res..

[24]  Tong Zhang,et al.  A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems , 2011, 1108.4988.

[25]  H. Zou,et al.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION. , 2012, Annals of statistics.

[26]  Jianqing Fan,et al.  A Selective Overview of Variable Selection in High Dimensional Feature Space. , 2009, Statistica Sinica.

[27]  Jianqing Fan,et al.  NETWORK EXPLORATION VIA THE ADAPTIVE LASSO AND SCAD PENALTIES. , 2009, The annals of applied statistics.

[28]  Jinchi Lv,et al.  A unified approach to model selection and sparse recovery using regularized least squares , 2009, 0905.3573.

[29]  Xingye Qiao,et al.  Regularization after retention in ultrahigh dimensional linear regression models , 2013, 1311.5625.

[30]  Lixing Zhu,et al.  NONCONCAVE PENALIZED M-ESTIMATION WITH A DIVERGING NUMBER OF PARAMETERS , 2011 .

[31]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .