Iterative randomized robust linear regression

A promising approach when dealing with massive data sets is to apply randomized dimensionality reduction and then operate in lower dimensions. This paper deals with the randomized linear regression task in the case where the available data are sporadically corrupted. Instead of relying to minimization of norms, which are robust to outliers, an alternative route is taken. Building upon the observation that outliers can be detected, while operating in a low dimensional randomized projections produced embedding, a mechanism for iteratively detecting and excluding corrupted data is proposed. As a result, the linear regression is performed using conventional LS approximation, without the need to resort to linear programming-based ℓ1 norm minimization tasks.

[1]  Nir Ailon,et al.  An almost optimal unrestricted fast Johnson-Lindenstrauss transform , 2010, SODA '11.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  Deanna Needell,et al.  Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit , 2007, IEEE Journal of Selected Topics in Signal Processing.

[4]  Bernard Chazelle,et al.  Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform , 2006, STOC '06.

[5]  David P. Woodruff,et al.  Subspace embeddings for the L1-norm with applications , 2011, STOC '11.

[6]  Tamás Sarlós,et al.  Improved Approximation Algorithms for Large Matrices via Random Projections , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[7]  Nir Ailon,et al.  Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes , 2008, SODA '08.

[8]  Michael W. Mahoney,et al.  A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares , 2014, J. Mach. Learn. Res..

[9]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10]  David P. Woodruff,et al.  Fast approximation of matrix coherence and statistical leverage , 2011, ICML.

[11]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[12]  Dimitris Achlioptas,et al.  Fast computation of low-rank matrix approximations , 2007, JACM.

[13]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[14]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[15]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[16]  Lie Wang,et al.  Orthogonal Matching Pursuit for Sparse Signal Recovery With Noise , 2011, IEEE Transactions on Information Theory.

[17]  David P. Woodruff,et al.  The Fast Cauchy Transform and Faster Robust Linear Regression , 2012, SIAM journal on computing (Print).

[18]  David P. Woodruff,et al.  The Fast Cauchy Transform: with Applications to Basis Construction, Regression, and Subspace Approximation in L1 , 2012, arXiv.org.

[19]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[20]  Joel A. Tropp,et al.  Improved Analysis of the subsampled Randomized Hadamard Transform , 2010, Adv. Data Sci. Adapt. Anal..

[21]  Michael W. Mahoney Randomized Algorithms for Matrices and Data , 2011, Found. Trends Mach. Learn..

[22]  S. Muthukrishnan,et al.  Faster least squares approximation , 2007, Numerische Mathematik.

[23]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[24]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.