Sketching for M-Estimators: A Unified Approach to Robust Regression

We give algorithms for the M-estimators minx||Ax − b||G, where A ∈ RnXd and b ∈ Rn, and ||y||G for y ∈ Rn is specified by a cost function G: R → R≥0, with ||y||G ≡ ΣiG(yi). The M-estimators generalize lp regression, for which G(x) = |x|p. We first show that the Huber measure can be computed up to relative error e in O(nnz(A) log n + poly(d(log n)/e)) time, where nnz(A) denotes the number of non-zero entries of the matrix A. Huber is arguably the most widely used M-estimator, enjoying the robustness properties of l1 as well as the smoothness properties of l2. We next develop algorithms for general M-estimators. We analyze the M-sketch, which is a variation of a sketch introduced by Verbin and Zhang in the context of estimating the earthmover distance. We show that the M-sketch can be used much more generally for sketching any M- estimator provided G has growth that is at least linear and at most quadratic. Using the M-sketch we solve the M-estimation problem in O(nnz(A) + poly(d log n)) time for any such G that is convex, making a single pass over the matrix and finding a solution whose residual error is within a constant factor of optimal, with high probability.

[1]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[2]  Shang-Hua Teng,et al.  Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems , 2003, STOC '04.

[3]  Daniel M. Kane,et al.  Sparser Johnson-Lindenstrauss Transforms , 2010, JACM.

[4]  David P. Woodruff,et al.  Optimal approximations of the frequency moments of data streams , 2005, STOC '05.

[5]  Wojciech Niemiro Asymptotics for M-estimators defined by convex minimization , 1992 .

[6]  Zhengyou Zhang,et al.  Parameter estimation techniques: a tutorial with application to conic fitting , 1997, Image Vis. Comput..

[7]  Gary L. Miller,et al.  Iterative Approaches to Row Sampling , 2012, ArXiv.

[8]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[9]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[10]  Gary L. Miller,et al.  Approaching Optimality for Solving SDD Linear Systems , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[11]  Petros Drineas,et al.  Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication , 2006, SIAM J. Comput..

[12]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[13]  Santosh S. Vempala,et al.  Adaptive Sampling and Fast Low-Rank Matrix Approximation , 2006, APPROX-RANDOM.

[14]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES III: COMPUTING A COMPRESSED APPROXIMATE MATRIX DECOMPOSITION∗ , 2004 .

[15]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[16]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[17]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[18]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[19]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Based Methods , 2006, APPROX-RANDOM.

[20]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[21]  L. M. M.-T. Theory of Probability , 1929, Nature.

[22]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[23]  Gary L. Miller,et al.  A Nearly-m log n Time Solver for SDD Linear Systems , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[24]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[25]  David R. Musicant,et al.  Robust Linear and Support Vector Regression , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  S. Muthukrishnan,et al.  Subspace Sampling and Relative-Error Matrix Approximation: Column-Row-Based Methods , 2006, ESA.

[27]  Sjoerd Dirksen,et al.  Toward a unified theory of sparse dimensionality reduction in Euclidean space , 2013, STOC.

[28]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[29]  Gary L. Miller,et al.  A ug 2 01 0 Approaching optimality for solving SDD linear systems ∗ , 2011 .

[30]  Christos Boutsidis,et al.  Improved Matrix Algorithms via the Subsampled Randomized Hadamard Transform , 2012, SIAM J. Matrix Anal. Appl..

[31]  R. Ostrovsky,et al.  Zero-one frequency laws , 2010, STOC '10.

[32]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[33]  Qin Zhang,et al.  Rademacher-Sketch: A Dimensionality-Reducing Embedding for Sum-Product Norms, with an Application to Earth-Mover Distance , 2012, ICALP.

[34]  H. Jeffreys,et al.  Theory of probability , 1896 .

[35]  Antoine Guitton,et al.  Robust and stable velocity analysis using the Huber function , 1999 .

[36]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2013, STOC '13.

[37]  Michael W. Mahoney,et al.  Quantile Regression for Large-Scale Applications , 2013, SIAM J. Sci. Comput..

[38]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[39]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[40]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[41]  Dimitris Achlioptas,et al.  Database-friendly random projections: Johnson-Lindenstrauss with binary coins , 2003, J. Comput. Syst. Sci..

[42]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[43]  Anirban Dasgupta,et al.  A sparse Johnson: Lindenstrauss transform , 2010, STOC '10.

[44]  Andreas Maurer A bound on the deviation probability for sums of non-negative random variables. , 2003 .

[45]  Santosh S. Vempala,et al.  Matrix approximation and projective clustering via volume sampling , 2006, SODA '06.

[46]  David P. Woodruff,et al.  Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables , 2013, COLT.