Robust Linear Regression Analysis— A Greedy Approach

The task of robust linear estimation in the presence of outliers is of particular importance in signal processing, statistics and machine learning. Although the problem has been stated a few decades ago and solved using classical (considered nowadays) methods, recently, it has attracted more attention in the context of sparse modeling, where several notable contributions have been made. In the present manuscript, a new approach is considered in the framework of greedy algorithms. The noise is split into two components: a) the inlier bounded noise and b) the outliers, which are explicitly modeled by employing sparsity arguments. Based on this scheme, a novel efficient algorithm (Greedy Algorithm for Robust Denoising-GARD), is derived. GARD alternates between a least square optimization criterion and an Orthogonal Matching Pursuit (OMP) selection step that identifies the outliers. The case where only outliers are present has been studied separately, where bounds on the Restricted Isometry Property guarantee that the recovery of the signal via GARD is exact. Moreover, theoretical results concerning convergence as well as the the recovery of the support of the sparse outlier vector and derivation of error bounds in the case of additional bounded noise are discussed. Finally, we provide extensive simulations, which demonstrate the comparative advantages of the new technique.

[1]  Bhaskar D. Rao,et al.  Algorithms for robust linear regression by exploiting the connection to sparse signal recovery , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[3]  W. J. Dixon,et al.  Analysis of Extreme Values , 1950 .

[4]  Bob L. Sturm,et al.  Comparison of orthogonal matching pursuit implementations , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[5]  P. J. Huber The 1972 Wald Lecture Robust Statistics: A Review , 1972 .

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[8]  Deanna Needell,et al.  Signal Recovery From Incomplete and Inaccurate Measurements Via Regularized Orthogonal Matching Pursuit , 2007, IEEE Journal of Selected Topics in Signal Processing.

[9]  Lisa Turner,et al.  Applications of Second Order Cone Programming , 2012 .

[10]  A. Madansky Identification of Outliers , 1988 .

[11]  Stéphane Mallat,et al.  A Wavelet Tour of Signal Processing - The Sparse Way, 3rd Edition , 2008 .

[12]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[13]  Jian Tang,et al.  Capabilities of outlier detection schemes in large datasets, framework and methodologies , 2006, Knowledge and Information Systems.

[14]  J. J. Douglas On the Numerical Integration of $\frac{\partial ^2 u}{\partial x^2 } + \frac{\partial ^2 u}{\partial y^2 } = \frac{\partial u}{\partial t}$ by Implicit Methods , 1955 .

[15]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[16]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[17]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[18]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[19]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[20]  Visa Koivunen,et al.  Robust greedy algorithms for compressed sensing , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[21]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[22]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[23]  S. Mallat A wavelet tour of signal processing , 1998 .

[24]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[25]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[26]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[27]  David Lindley,et al.  Introduction to the Practice of Statistics , 1990, The Mathematical Gazette.

[28]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[29]  Gonzalo Mateos,et al.  Robust Nonparametric Regression via Sparsity Control With Application to Load Curve Data Cleansing , 2011, IEEE Transactions on Signal Processing.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Rama Chellappa,et al.  IEEE TRANSACTIONS ON SIGNAL PROCESSING 1 Analysis of Sparse Regularization Based Robust Regression Approaches , 2022 .

[32]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[33]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[34]  Arindam Banerjee,et al.  Bregman Alternating Direction Method of Multipliers , 2013, NIPS.

[35]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[36]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[37]  Rama Chellappa,et al.  Robust RVM regression using sparse outlier model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[40]  Peter J. Huber,et al.  Wiley Series in Probability and Mathematics Statistics , 2005 .

[41]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[42]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[43]  Jian Tang,et al.  Modeling and efficient mining of intentional knowledge of outliers , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[44]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[45]  相原 龍,et al.  Alternating Direction Method of Multipliersを用いた声質変換のためのパラレル辞書学習 , 2015 .

[46]  D. Donoho,et al.  Sparse MRI: The application of compressed sensing for rapid MR imaging , 2007, Magnetic resonance in medicine.

[47]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[49]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.