Randomized Sketches of Convex Programs With Sharp Guarantees

Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by solving a lower dimensional problem. Such dimensionality reduction is essential in computation-limited settings, since the complexity of general convex programming can be quite high (e.g., cubic for quadratic programs, and substantially higher for semidefinite programs). In addition to computational savings, RP is also useful for reducing memory usage, and has useful properties for privacy-preserving optimization. We prove that the approximation ratio of this procedure can be bounded in terms of the geometry of the constraint set. For a broad class of RPs, including those based on various sub-Gaussian distributions as well as randomized Hadamard and Fourier transforms, the data matrix defining the cost function can be projected to a dimension proportional to the squared Gaussian width of the tangent cone of the constraint set at the original solution. This effective dimension of the convex program is often substantially smaller than the original dimension. We illustrate consequences of our theory for various cases, including unconstrained and $\ell _{1}$ -constrained least squares, support vector machines, low-rank matrix estimation, and discuss implications for privacy-preserving optimization, as well as connections with denoising and compressed sensing.

[1]  M. Wainwright Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues , 2014 .

[2]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[3]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[4]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[5]  P. Massart,et al.  About the constants in Talagrand's concentration inequalities for empirical processes , 2000 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Martin J. Wainwright,et al.  Newton Sketch: A Near Linear-Time Optimization Algorithm with Linear-Quadratic Convergence , 2015, SIAM J. Optim..

[8]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[9]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[10]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[11]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[12]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[13]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[14]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[15]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[16]  Christos Boutsidis,et al.  Near-Optimal Coresets for Least-Squares Regression , 2012, IEEE Transactions on Information Theory.

[17]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[18]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[19]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[20]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[21]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[22]  D. Pollard Convergence of stochastic processes , 1984 .

[23]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[24]  Venkatesh Saligrama,et al.  Information Theoretic Bounds for Compressed Sensing , 2008, IEEE Transactions on Information Theory.

[25]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[26]  Andrew R. Barron,et al.  Fast Sparse Superposition Codes Have Near Exponential Error Probability for $R<{\cal C}$ , 2014, IEEE Transactions on Information Theory.

[27]  Yonina C. Eldar,et al.  Noise Folding in Compressed Sensing , 2011, IEEE Signal Processing Letters.

[28]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[29]  G. Pisier Probabilistic methods in the geometry of Banach spaces , 1986 .

[30]  Larry A. Wasserman,et al.  Compressed and Privacy-Sensitive Sparse Regression , 2009, IEEE Transactions on Information Theory.

[31]  Andrea Montanari,et al.  Accurate Prediction of Phase Transitions in Compressed Sensing via a Connection to Minimax Denoising , 2011, IEEE Transactions on Information Theory.

[32]  Rachel Ward,et al.  New and Improved Johnson-Lindenstrauss Embeddings via the Restricted Isometry Property , 2010, SIAM J. Math. Anal..

[33]  S. Mendelson,et al.  Reconstruction and Subgaussian Operators in Asymptotic Geometric Analysis , 2007 .

[34]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[35]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[36]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[37]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[38]  Martin J. Wainwright,et al.  Iterative Hessian Sketch: Fast and Accurate Solution Approximation for Constrained Least-Squares , 2014, J. Mach. Learn. Res..

[39]  V. Koltchinskii,et al.  Rademacher Processes and Bounding the Risk of Function Learning , 2004, math/0405338.