论文信息 - Robust Regression via Model Based Methods

Robust Regression via Model Based Methods

The mean squared error loss is widely used in many applications, including auto-encoders, multi-target regression, and matrix factorization, to name a few. Despite computational advantages due to its differentiability, it is not robust to outliers. In contrast, `p norms are known to be robust, but cannot be optimized via, e.g., stochastic gradient descent, as they are non-differentiable. We propose an algorithm inspired by so-called model-based optimization (MBO) [35, 36], which replaces a non-convex objective with a convex model function and alternates between optimizing the model function and updating the solution. We apply this to robust regression, proposing SADM, a stochastic variant of the Online Alternating Direction Method of Multipliers (OADM) [48] to solve the inner optimization in MBO. We show that SADM converges with the rate O(log T/T ). Finally, we demonstrate experimentally (a) the robustness of `p norms to outliers and (b) the efficiency of our proposed model-based algorithms in comparison with gradient methods on autoencoders and multi-target regression.

[1] Feiping Nie,et al. Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[2] Anders P. Eriksson,et al. Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L1 norm , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] Nojun Kwak,et al. Principal Component Analysis Based on L1-Norm Maximization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Korris Fu-Lai Chung,et al. The l2, 1-Norm Stacked Robust Autoencoders for Domain Adaptation , 2016, AAAI.

[5] Arindam Banerjee,et al. Online Alternating Direction Method (longer version) , 2013, ArXiv.

[6] Xiaoming Yuan,et al. Recovering Low-Rank and Sparse Components of Matrices from Incomplete and Noisy Observations , 2011, SIAM J. Optim..

[7] Ana de Almeida,et al. Nonnegative Matrix Factorization , 2018 .

[8] Eyke Hüllermeier,et al. Multi-target prediction: a unifying view on problems and methods , 2018, Data Mining and Knowledge Discovery.

[9] Mikael Johansson,et al. Convergence of a Stochastic Gradient Method with Momentum for Nonsmooth Nonconvex Optimization , 2020, ICML.

[10] Angshul Majumdar,et al. Stacked Robust Autoencoder for Classification , 2016, ICONIP.

[11] Arindam Banerjee,et al. Online Alternating Direction Method , 2012, ICML.

[12] Jieping Ye,et al. Efficient L1/Lq Norm Regularization , 2010, ArXiv.

[13] Grigorios Tsoumakas,et al. Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[14] John Wright,et al. RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Peter Filzmoser,et al. Robust Factorization of a Data Matrix , 1998, COMPSTAT.

[16] Jérôme Idier,et al. Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[17] Mehran Mesbahi,et al. Online distributed ADMM via dual averaging , 2014, 53rd IEEE Conference on Decision and Control.

[18] P. Paatero,et al. Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[19] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .

[20] Chris H. Q. Ding,et al. Robust nonnegative matrix factorization using L21-norm , 2011, CIKM '11.

[21] Dmitriy Drusvyatskiy,et al. Efficiency of minimizing compositions of convex functions and smooth maps , 2016, Math. Program..

[22] s-taiji. Dual Averaging and Proximal Gradient Descent for Online Alternating Direction Multiplier Method , 2013 .

[23] C. Michelot. A finite algorithm for finding the projection of a point onto the canonical simplex of ∝n , 1986 .

[24] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[25] Grigorios Tsoumakas,et al. Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[26] Lei Shi,et al. Robust Multiple Kernel K-means Using L21-Norm , 2015, IJCAI.

[27] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[28] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[29] Mike E. Davies,et al. Iterative Hard Thresholding for Compressed Sensing , 2008, ArXiv.

[30] Hédy Attouch,et al. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[31] Stratis Ioannidis,et al. Massively Distributed Graph Distances , 2020, IEEE Transactions on Signal and Information Processing over Networks.

[32] Jean-Philippe Vial,et al. Strong and Weak Convexity of Sets and Functions , 1983, Math. Oper. Res..

[33] Mohamed-Jalal Fadili,et al. Non-smooth Non-convex Bregman Minimization: Unification and New Algorithms , 2017, Journal of Optimization Theory and Applications.

[34] Feng Ruan,et al. Stochastic Methods for Composite and Weakly Convex Optimization Problems , 2017, SIAM J. Optim..

[35] Alexander G. Gray,et al. Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[36] Scott Pesme,et al. Online Robust Regression via SGD on the l1 loss , 2020, NeurIPS.

[37] Stephen J. Wright,et al. A proximal method for composite minimization , 2008, Mathematical Programming.

[38] ChengXiang Zhai,et al. Robust Unsupervised Feature Selection , 2013, IJCAI.

[39] Peter Ochs,et al. Model Function Based Conditional Gradient Method with Armijo-like Line Search , 2019, ICML.

[40] Yuanyuan Liu,et al. Accelerated Variance Reduced Stochastic ADMM , 2017, AAAI.

[41] Damek Davis,et al. Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems , 2017, SIAM J. Optim..

[42] Nicolas Gillis,et al. Inertial Block Proximal Methods for Non-Convex Non-Smooth Optimization , 2019, ICML.

[43] Philippe C. Besse,et al. A L 1-norm PCA and a Heuristic Approach , 1996 .

[44] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[45] James T. Kwok,et al. Fast-and-Light Stochastic ADMM , 2016, IJCAI.

[46] Chris H. Q. Ding,et al. R1-PCA: rotational invariant L1-norm principal component analysis for robust subspace factorization , 2006, ICML.

[47] Balas K. Natarajan,et al. Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[48] Nicolas Gillis. Nonnegative Matrix Factorization , 2020 .

[49] Xuelong Li,et al. L1-Norm-Based 2DPCA , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[50] John Wright,et al. RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[51] Dmitriy Drusvyatskiy,et al. Error Bounds, Quadratic Growth, and Linear Convergence of Proximal Methods , 2016, Math. Oper. Res..