On Coresets For Regularized Regression

We study the effect of norm based regularization on the size of coresets for regression problems. Specifically, given a matrix $ \mathbf{A} \in {\mathbb{R}}^{n \times d}$ with $n\gg d$ and a vector $\mathbf{b} \in \mathbb{R} ^ n $ and $\lambda > 0$, we analyze the size of coresets for regularized versions of regression of the form $\|\mathbf{Ax}-\mathbf{b}\|_p^r + \lambda\|{\mathbf{x}}\|_q^s$ . Prior work has shown that for ridge regression (where $p,q,r,s=2$) we can obtain a coreset that is smaller than the coreset for the unregularized counterpart i.e. least squares regression (Avron et al). We show that when $r \neq s$, no coreset for regularized regression can have size smaller than the optimal coreset of the unregularized version. The well known lasso problem falls under this category and hence does not allow a coreset smaller than the one for least squares regression. We propose a modified version of the lasso problem and obtain for it a coreset of size smaller than the least square regression. We empirically show that the modified version of lasso also induces sparsity in solution, similar to the original lasso. We also obtain smaller coresets for $\ell_p$ regression with $\ell_p$ regularization. We extend our methods to multi response regularized regression. Finally, we empirically demonstrate the coreset performance for the modified lasso and the $\ell_1$ regression with $\ell_1$ regularization.

[1]  L. Schulman,et al.  Universal ε-approximators for integrals , 2010, SODA '10.

[2]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[3]  David P. Woodruff,et al.  Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski p-Norms , 2018, ICML.

[4]  Richard Peng,et al.  ℓp Row Sampling by Lewis Weights , 2014, ArXiv.

[5]  Kirk Pruhs,et al.  On Coresets for Regularized Loss Minimization , 2019, ArXiv.

[6]  David P. Woodruff,et al.  Sharper Bounds for Regularized Data Fitting , 2016, APPROX-RANDOM.

[7]  David P. Woodruff,et al.  Optimal Deterministic Coresets for Ridge Regression , 2020, AISTATS.

[8]  Michael W. Mahoney,et al.  Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments , 2015, Proceedings of the IEEE.

[9]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[10]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[11]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[12]  David P. Woodruff,et al.  Subspace embeddings for the L1-norm with applications , 2011, STOC '11.

[13]  Ibrahim Jubran,et al.  Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  David P. Woodruff,et al.  Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables , 2013, COLT.

[15]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[16]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC.

[17]  Nikhil Srivastava,et al.  Twice-ramanujan sparsifiers , 2008, STOC '09.

[18]  Martin J. Wainwright,et al.  Randomized sketches of convex programs with sharp guarantees , 2014, 2014 IEEE International Symposium on Information Theory.

[19]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[20]  Anant Raj,et al.  Importance Sampling via Local Sensitivity , 2020, AISTATS.

[21]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[22]  Alexander J. Smola,et al.  Communication Efficient Coresets for Empirical Loss Minimization , 2015, UAI.

[23]  Richard Peng,et al.  Lp Row Sampling by Lewis Weights , 2015, STOC.

[24]  Michael D. Gordon,et al.  Regularized Least Absolute Deviations Regression and an Efficient Algorithm for Parameter Tuning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Pınar Tüfekci,et al.  Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods , 2014 .

[26]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[27]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[28]  David Haussler,et al.  Sphere Packing Numbers for Subsets of the Boolean n-Cube with Bounded Vapnik-Chervonenkis Dimension , 1995, J. Comb. Theory, Ser. A.

[29]  Vladimir Braverman,et al.  New Frameworks for Offline and Streaming Coreset Constructions , 2016, ArXiv.

[30]  David P. Woodruff,et al.  The Fast Cauchy Transform and Faster Robust Linear Regression , 2012, SIAM J. Comput..

[31]  Andreas Krause,et al.  Practical Coreset Constructions for Machine Learning , 2017, 1703.06476.

[32]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[33]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[34]  Dan Feldman,et al.  Coresets For Monotonic Functions with Applications to Deep Learning , 2018, ArXiv.