Robust Regression via Heuristic Corruption Thresholding and Its Adaptive Estimation Variation

The presence of data noise and corruptions has recently invoked increasing attention on robust least-squares regression (RLSR), which addresses this fundamental problem that learns reliable regression coefficients when response variables can be arbitrarily corrupted. Until now, the following important challenges could not be handled concurrently: (1) rigorous recovery guarantee of regression coefficients, (2) difficulty in estimating the corruption ratio parameter, and (3) scaling to massive datasets. This article proposes a novel Robust regression algorithm via Heuristic Corruption Thresholding (RHCT) that concurrently addresses all the above challenges. Specifically, the algorithm alternately optimizes the regression coefficients and estimates the optimal uncorrupted set via heuristic thresholding without a pre-defined corruption ratio parameter until its convergence. Moreover, to improve the efficiency of corruption estimation in large-scale data, a Robust regression algorithm via Adaptive Corruption Thresholding (RACT) is proposed to determine the size of the uncorrupted set in a novel adaptive search method without iterating data samples exhaustively. In addition, we prove that our algorithms benefit from strong guarantees analogous to those of state-of-the-art methods in terms of convergence rates and recovery guarantees. Extensive experiments demonstrate that the effectiveness of our new methods is superior to that of existing methods in the recovery of both regression coefficients and uncorrupted sets, with very competitive efficiency.

[1]  John Wright,et al.  Dense Error Correction via L1-Minimization , 2008, 0809.0199.

[2]  Dong Wang,et al.  Scalable Uncertainty-Aware Truth Discovery in Big Data Social Sensing Applications for Cyber-Physical Systems , 2020, IEEE Transactions on Big Data.

[3]  Xuchao Zhang,et al.  Robust Regression via Online Feature Selection Under Adversarial Data Corruption , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[4]  P. J. Huber,et al.  The Basic Types of Estimates , 2005 .

[5]  Helmut Bölcskei,et al.  Recovery of Sparsely Corrupted Signals , 2011, IEEE Transactions on Information Theory.

[6]  Jia Wang,et al.  Towards customer trouble tickets resolution automation in large cellular services: demo , 2016, MobiCom.

[7]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[8]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Xuchao Zhang,et al.  Robust Regression via Heuristic Hard Thresholding , 2017, IJCAI.

[10]  Joachim M. Buhmann,et al.  Fast and Robust Least Squares Estimation in Corrupted Linear Models , 2014, NIPS.

[11]  Aleksandar Lazarevic,et al.  Outlier Detection with Kernel Density Functions , 2007, MLDM.

[12]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[13]  Sneha A. Dalvi,et al.  Internet of Things for Smart Cities , 2017 .

[14]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[15]  Yong Dou,et al.  Robust regularized extreme learning machine for regression using iteratively reweighted least squares , 2017, Neurocomputing.

[16]  Vanda M. Lourenço,et al.  Robust linear regression methods in association studies , 2011, Bioinform..

[17]  Chao Huang,et al.  Topic-Aware Social Sensing with Arbitrary Source Dependency Graphs , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[18]  Yoonsuh Jung,et al.  Robust regression for highly corrupted response by shifting outliers , 2016 .

[19]  Ran Wolff,et al.  Noname manuscript No. (will be inserted by the editor) In-Network Outlier Detection in Wireless Sensor Networks , 2022 .

[20]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[21]  Michael Muma,et al.  Robust Estimation in Signal Processing: A Tutorial-Style Treatment of Fundamental Concepts , 2012, IEEE Signal Processing Magazine.

[22]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[23]  P. J. Huber Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .

[24]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.

[25]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[26]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[27]  Volker Roth,et al.  Kernel Fisher Discriminants for Outlier Detection , 2006, Neural Computation.

[28]  PETER J. ROUSSEEUW,et al.  Computing LTS Regression for Large Data Sets , 2005, Data Mining and Knowledge Discovery.

[29]  John Wright,et al.  Dense Error Correction Via $\ell^1$-Minimization , 2010, IEEE Transactions on Information Theory.

[30]  Henry Leung,et al.  A variational Bayesian approach to robust sensor fusion based on Student-t distribution , 2013, Inf. Sci..

[31]  Xuchao Zhang,et al.  Online and Distributed Robust Regressions Under Adversarial Data Corruption , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[32]  Trac D. Tran,et al.  Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[33]  Hans-Peter Kriegel,et al.  OPTICS-OF: Identifying Local Outliers , 1999, PKDD.

[34]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[35]  H. E. Solberg,et al.  Detection of outliers in reference distributions: performance of Horn's algorithm. , 2005, Clinical chemistry.

[36]  Mohammed Bennamoun,et al.  Robust Regression for Face Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[37]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[38]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[39]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[40]  Gaudenz Danuser,et al.  Parametric Model Fitting: From Inlier Characterization to Outlier Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Allen Y. Yang,et al.  Fast ℓ1-minimization algorithms and an application in robust face recognition: A review , 2010, 2010 IEEE International Conference on Image Processing.

[43]  Nitesh V. Chawla,et al.  Reliable fake review detection via modeling temporal and behavioral patterns , 2017, 2017 IEEE International Conference on Big Data (Big Data).