Errors-in-variables models with dependent measurements

Suppose that we observe $y \in \mathbb{R}^n$ and $X \in \mathbb{R}^{n \times m}$ in the following errors-in-variables model: \begin{eqnarray*} y & = & X_0 \beta^* + \epsilon X & = & X_0 + W \end{eqnarray*} where $X_0$ is a $n \times m$ design matrix with independent subgaussian row vectors, $\epsilon \in \R^n$ is a noise vector and $W$ is a mean zero $n \times m$ random noise matrix with independent subgaussian column vectors, independent of $X_0$ and $\epsilon$. This model is significantly different from those analyzed in the literature in the sense that we allow the measurement error for each covariate to be a dependent vector across its $n$ observations. Such error structures appear in the science literature when modeling the trial-to-trial fluctuations in response strength shared across a set of neurons. We establish consistency in estimating $\beta^*$ and obtain the rates of convergence in the $\ell_q$ norm, where $q = 1, 2$ for the Lasso-type estimator, and for $q \in [1, 2]$ for a Dantzig-type conic programming estimator. We show error bounds which approach that of the regular Lasso and the Dantzig selector in case the errors in $W$ are tending to 0. We analyze the convergence rates of the gradient descent methods for solving the nonconvex programs and show that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to a neighborhood of the global minimizers: the size of the neighborhood is bounded by the statistical error in the $\ell_2$ norm. Our analysis reveals interesting connections between compuational and statistical efficiency and the concentration of measure phenomenon in random matrix theory. We provide simulation evidence illuminating the theoretical predictions.

[1]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[2]  Peter Bühlmann,et al.  Pattern alternating maximization algorithm for missing data in high-dimensional problems , 2014, J. Mach. Learn. Res..

[3]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[4]  M. Cohen,et al.  Measuring and interpreting neuronal correlations , 2011, Nature Neuroscience.

[5]  W. Härdle,et al.  Estimation in a semiparametric partially linear errors-in-variables model , 1999 .

[6]  Shuheng Zhou Gemini: Graph estimation with matrix variate normal instances , 2012, 1209.5075.

[7]  Â. J. Vial,et al.  Strong Convexity of Sets and Functions , 1982 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  J. R. Cook,et al.  Simulation-Extrapolation Estimation in Parametric Measurement Error Models , 1994 .

[10]  R. Carroll,et al.  Polynomial Regression and Estimating Functions in the Presence of Multiplicative Measurement Error , 1999 .

[11]  P. Dutilleul The mle algorithm for the matrix normal distribution , 1999 .

[12]  Leonard A. Stefanski,et al.  The effects of measurement error on parameter estimation , 1985 .

[13]  Raymond J. Carroll,et al.  Semiparametric Estimation in Logistic Measurement Error Models , 1989 .

[14]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[15]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[16]  Qinfeng Xu,et al.  Covariate Selection for Linear Errors-in-Variables Regression Models , 2007 .

[17]  B. Efron Are a set of microarrays independent of each other? , 2009, The annals of applied statistics.

[18]  Arnoldo Frigessi,et al.  Measurement error in Lasso: impact and likelihood bias correction , 2012, 1210.5378.

[19]  Genevera I. Allen,et al.  TRANSPOSABLE REGULARIZED COVARIANCE MODELS WITH AN APPLICATION TO MISSING DATA IMPUTATION. , 2009, The annals of applied statistics.

[20]  P. Hall,et al.  Semiparametric estimators of functional measurement error models with unknown error , 2007 .

[21]  M. Rudelson,et al.  High dimensional errors-in-variables models with dependent measurements , 2015, 1502.02355.

[22]  Korbinian Strimmer,et al.  Modeling gene expression measurement error: a quasi-likelihood approach , 2003, BMC Bioinformatics.

[23]  A. Tsybakov,et al.  Sparse recovery under matrix uncertainty , 2008, 0812.2818.

[24]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[25]  A. Tsybakov,et al.  Linear and conic programming estimators in high dimensional errors‐in‐variables models , 2014, 1408.0241.

[26]  Constantine Caramanis,et al.  Noisy and Missing Data Regression: Distribution-Oblivious Support Recovery , 2013, ICML.

[27]  Raymond J. Carroll,et al.  Comparison of Least Squares and Errors-in-Variables Regression, with Special Reference to Randomized Analysis of Covariance , 1985 .

[28]  M. Rudelson,et al.  Hanson-Wright inequality and sub-gaussian concentration , 2013 .

[29]  A. Tsybakov,et al.  Improved Matrix Uncertainty Selector , 2011, 1112.4413.

[30]  D. Ruppert,et al.  Measurement Error in Nonlinear Models , 1995 .

[31]  Petre Stoica,et al.  On Estimation of Covariance Matrices With Kronecker Product Structure , 2008, IEEE Transactions on Signal Processing.

[32]  S. Mendelson,et al.  Uniform Uncertainty Principle for Bernoulli and Subgaussian Ensembles , 2006, math/0608665.

[33]  A. Tsybakov,et al.  High-dimensional instrumental variables regression and confidence sets -- v2/2012 , 2018, 1812.11330.

[34]  Douglas A Ruff,et al.  Attention can increase or decrease spike count correlations between pairs of neurons depending on their role in a task , 2014, Nature Neuroscience.

[35]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[36]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[37]  Runze Li,et al.  Variable Selection for Partially Linear Models With Measurement Errors , 2009, Journal of the American Statistical Association.

[38]  J. R. Cook,et al.  Simulation-Extrapolation: The Measurement Error Jackknife , 1995 .

[39]  Yihong Gong,et al.  Large-scale collaborative prediction using a nonparametric random effects model , 2009, ICML '09.

[40]  Leonard A. Stefanski,et al.  Rates of convergence of some estimators in a class of deconvolution problems , 1990 .

[41]  A. Dawid Some matrix-variate distribution theory: Notational considerations and a Bayesian application , 1981 .

[42]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .

[43]  J. T. Hwang Multiplicative Errors-in-Variables Models with Applications to Recent Data Released by the U.S. Department of Energy , 1986 .

[44]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[45]  Wayne A. Fuller,et al.  Measurement Error Models , 1988 .

[46]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[47]  A. Frigessi,et al.  Covariate Selection in High-Dimensional Generalized Linear Models With Measurement Error , 2014, Journal of Computational and Graphical Statistics.

[48]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[49]  T. Varga,et al.  Characterization of matrix variate normal distributions , 1992 .

[50]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[51]  Runze Li,et al.  Variable Selection in Measurement Error Models. , 2010, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[52]  Robert E Kass,et al.  Statistical issues in the analysis of neuronal data. , 2005, Journal of neurophysiology.

[53]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[54]  Neil D. Lawrence,et al.  The Bigraphical Lasso , 2013, ICML.

[55]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[56]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..