论文信息 - Optimal Sketching Bounds for Sparse Linear Regression

Optimal Sketching Bounds for Sparse Linear Regression

We study oblivious sketching for $k$-sparse linear regression under various loss functions such as an $\ell_p$ norm, or from a broad class of hinge-like loss functions, which includes the logistic and ReLU losses. We show that for sparse $\ell_2$ norm regression, there is a distribution over oblivious sketches with $\Theta(k\log(d/k)/\varepsilon^2)$ rows, which is tight up to a constant factor. This extends to $\ell_p$ loss with an additional additive $O(k\log(k/\varepsilon)/\varepsilon^2)$ term in the upper bound. This establishes a surprising separation from the related sparse recovery problem, which is an important special case of sparse regression. For this problem, under the $\ell_2$ norm, we observe an upper bound of $O(k \log (d)/\varepsilon + k\log(k/\varepsilon)/\varepsilon^2)$ rows, showing that sparse recovery is strictly easier to sketch than sparse regression. For sparse regression under hinge-like loss functions including sparse logistic and sparse ReLU regression, we give the first known sketching bounds that achieve $o(d)$ rows showing that $O(\mu^2 k\log(\mu n d/\varepsilon)/\varepsilon^2)$ rows suffice, where $\mu$ is a natural complexity parameter needed to obtain relative error bounds for these loss functions. We again show that this dimension is tight, up to lower order terms and the dependence on $\mu$. Finally, we show that similar sketching bounds can be achieved for LASSO regression, a popular convex relaxation of sparse regression, where one aims to minimize $\|Ax-b\|_2^2+\lambda\|x\|_1$ over $x\in\mathbb{R}^d$. We show that sketching dimension $O(\log(d)/(\lambda \varepsilon)^2)$ suffices and that the dependence on $d$ and $\lambda$ is tight.

[1] David P. Woodruff,et al. Almost Linear Constant-Factor Sketching for 𝓁1 and Logistic Regression , 2023, ICLR.

[2] David P. Woodruff,et al. Online Lewis Weight Sampling , 2022, SODA.

[3] Alexander Munteanu,et al. p-Generalized Probit Regression and Scalable Maximum Likelihood Estimation via Sketching and Coresets , 2022, AISTATS.

[4] David P. Woodruff,et al. Oblivious Sketching for Logistic Regression , 2021, ICML.

[5] Anup B. Rao,et al. Coresets for Classification - Simplified and Strengthened , 2021, NeurIPS.

[6] David P. Woodruff,et al. Exponentially Improved Dimensionality Reduction for 𝓁1: Subspace Embeddings and Independence Testing , 2021, COLT.

[7] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[8] Gonçalo Abecasis,et al. Computationally efficient whole-genome regression for quantitative and binary traits , 2020, Nature Genetics.

[9] Yuantao Gu,et al. Lower Bound for RIP Constants and Concentration of Sum of Top Order Statistics , 2019, IEEE Transactions on Signal Processing.

[10] David P. Woodruff,et al. Tight Bounds for ℓp Oblivious Subspace Embeddings , 2018, SODA.

[11] H. Poor,et al. Analytical properties of generalized Gaussian distributions , 2018, Journal of Statistical Distributions and Applications.

[12] Sung Min Park,et al. Sparse PCA from Sparse Linear Regression , 2018, NeurIPS.

[13] Jelani Nelson,et al. Optimal terminal dimensionality reduction in Euclidean space , 2018, STOC.

[14] David P. Woodruff,et al. On Coresets for Logistic Regression , 2018, NeurIPS.

[15] Eric Price,et al. Active Regression via Linear-Sample Sparsification , 2017, COLT.

[16] Tengyuan Liang,et al. Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP , 2017, ICML.

[17] Piotr Indyk,et al. Approximate Sparse Linear Regression , 2016, ICALP.

[18] David P. Woodruff,et al. Nearly-optimal bounds for sparse recovery in generic norms, with applications to k-median sketching , 2015, SODA.

[19] David P. Woodruff,et al. Sketching for M-Estimators: A Unified Approach to Robust Regression , 2015, SODA.

[20] Dean P. Foster,et al. Variable Selection is Hard , 2014, COLT.

[21] David P. Woodruff. Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[22] Huy L. Nguyen,et al. Lower Bounds for Oblivious Subspace Embeddings , 2013, ICALP.

[23] David P. Woodruff,et al. Subspace Embeddings and $\ell_p$-Regression Using Exponential Random Variables , 2013, COLT.

[24] Santosh S. Vempala,et al. Principal Component Analysis and Higher Correlations for Distributed Data , 2013, COLT.

[25] T. Cai,et al. Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[26] S. Janson. Stable distributions , 2011, 1112.0220.

[27] David P. Woodruff,et al. Subspace embeddings for the L1-norm with applications , 2011, STOC '11.

[28] David P. Woodruff,et al. Lower bounds for sparse recovery , 2010, SODA '10.

[29] David P. Woodruff,et al. Numerical linear algebra in the streaming model , 2009, STOC '09.

[30] Abhimanyu Das,et al. Algorithms for subset selection in linear regression , 2008, STOC.

[31] M. Wainwright,et al. High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[32] R. DeVore,et al. A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[33] Anirban Dasgupta,et al. Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[34] Tamás Sarlós,et al. Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[35] Joel A. Tropp,et al. Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[36] Joel A. Tropp,et al. Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit , 2006, Signal Process..

[37] J. Tropp. Algorithms for simultaneous sparse approximation. Part II: Convex relaxation , 2006, Signal Process..

[38] S. Muthukrishnan,et al. Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[39] E. Candès,et al. Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[40] K. Clarkson. Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[41] S. Muthukrishnan,et al. Approximation of functions over redundant dictionaries using coherence , 2003, SODA '03.

[42] Moses Charikar,et al. Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[43] P. Indyk. Stable distributions, pseudorandom generators, embeddings and data stream computation , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[44] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[45] Larry Carter,et al. Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[46] P. Masani. Generalizations of P. Lévy's inversion theorem , 1977 .

[47] W. Bednorz,et al. ON TAILS OF SYMMETRIC AND TOTALLY ASYMMETRIC α-STABLE , 2020 .

[48] Abhimanyu Das,et al. Approximate Submodularity and its Applications: Subset Selection, Sparse Approximation and Dictionary Selection , 2018, J. Mach. Learn. Res..

[49] David P. Woodruff,et al. ( 1 + )-approximate Sparse Recovery , 2011 .

[50] W. Härdle,et al. Statistical Tools for Finance and Insurance , 2003 .

[51] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[52] A. Atkinson. Subset Selection in Regression , 1992 .

[53] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[54] F. e.. Calcul des Probabilités , 1889, Nature.