论文信息 - Truncated Linear Regression in High Dimensions - 字舞流文

Truncated Linear Regression in High Dimensions

As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + \eta_i$, where $x^*$ is some fixed unknown vector of interest and $\eta_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The goal is to recover $x^*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $\ell_2$ reconstruction error of $O(\sqrt{(k \log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.

Constantinos Daskalakis | Manolis Zampetakis | Dhruv Rohatgi | C. Daskalakis | M. Zampetakis | Dhruv Rohatgi

[1] Richard G. Baraniuk,et al. A simple proof that random matrices are democratic , 2009, ArXiv.

[2] Helmut Schneider. Truncated and censored samples from normal populations , 1986 .

[3] Richard Breen,et al. Regression Models: Censored, Sample Selected, or Truncated Data , 1996 .

[4] Jerry A. Hausman,et al. Social Experimentation, Truncated Distributions, and Efficient Estimation , 1977 .

[5] Econo Metrica. REGRESSION ANALYSIS WHEN THE DEPENDENT VARIABLE IS TRUNCATED NORMAL , 2016 .

[6] P. Schmidt,et al. Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[7] M. Rudelson,et al. Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[8] Richard G. Baraniuk,et al. Democracy in Action: Quantization, Saturation, and Compressive Sensing , 2011 .

[9] C. B. Morgan. Truncated and Censored Samples, Theory and Applications , 1993 .

[10] Sylvain Chevillard,et al. The functions erf and erfc computed with arbitrary precision and explicit error bounds , 2009, Inf. Comput..

[11] Christos Tzamos,et al. Computationally and Statistically Efficient Truncated Regression , 2020, COLT.

[12] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[13] D. McFadden,et al. The method of simulated scores for the estimation of LDV models , 1998 .

[14] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[15] D. Donoho. For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[16] J. Tobin. Estimation of Relationships for Limited Dependent Variables , 1958 .

[17] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[18] Michael Keane,et al. Simulation estimation for panel data models with limited dependent variables , 1993 .

[19] K. Pearson,et al. ON THE GENERALISED PROBABLE ERROR IN MULTIPLE NORMAL CORRELATION , 1908 .

[20] Narayanaswamy Balakrishnan,et al. The Art of Progressive Censoring , 2014 .

[21] Francis Galton,et al. An examination into the registered speeds of American trotting horses, with remarks on their value as hereditary data , 1898, Proceedings of the Royal Society of London.