Kernel-based online regression with canal loss

Abstract Typical online learning methods have brought fruitful achievements based on the framework of online convex optimization. Meanwhile, nonconvex loss functions also received numerous attentions for their merits of noise-resiliency and sparsity. Current nonconvex loss functions are typically designed as smooth for the ease of designing the optimization algorithms. However, these loss functions no longer have the property of sparse support vectors. In this work, we focus on regression with a special type of nonconvex loss function (i.e., canal loss), and propose a kernel-based online regression algorithm, n oise- r esilient o nline r egression (NROR), to deal with the noisy labels. The canal loss is a type of horizontally truncated loss and has the merit of sparsity. Although the canal loss is nonconvex and nonsmooth, the regularized canal loss has a property similar to convexity which is called strong pseudo-convexity. Furthermore, the sublinear regret bound of NROR is proved under certain assumptions. Experimental studies show that NROR achieves low prediction errors in terms of mean absolute error and root mean squared error on the datasets of heavy noisy labels. Particularly, we check whether the convergence assumption strictly holds in practice and find that the assumptions required for convergence are rarely violated, and the convergence rate is not affected.

[1]  Fabrice Rossi,et al.  Mean Absolute Percentage Error for regression models , 2016, Neurocomputing.

[2]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[3]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[4]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[5]  Daniele Calandriello,et al.  Efficient Second-Order Online Kernel Learning with Adaptive Embedding , 2017, NIPS.

[6]  Peter L. Bartlett,et al.  Online learning with kernel losses , 2018, ICML.

[7]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[8]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[9]  Trung Le,et al.  Approximation Vector Machines for Large-scale Online Learning , 2016, J. Mach. Learn. Res..

[10]  Michel Barlaud,et al.  Deterministic edge-preserving regularization in computed imaging , 1997, IEEE Trans. Image Process..

[11]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[12]  Adil M. Bagirov,et al.  Nonsmooth nonconvex optimization approach to clusterwise linear regression problems , 2013, Eur. J. Oper. Res..

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Ashok Cutkosky,et al.  Matrix-Free Preconditioning in Online Learning , 2019, ICML.

[15]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[16]  O. Mangasarian PSEUDO-CONVEX FUNCTIONS , 1965 .

[17]  Sona Taheri,et al.  Clusterwise support vector linear regression , 2020, Eur. J. Oper. Res..

[18]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[19]  K. Selçuk Candan,et al.  Tracking Disaster Footprints with Social Streaming Data , 2020, AAAI.

[20]  Sang Won Yoon,et al.  A support vector machine-based ensemble algorithm for breast cancer diagnosis , 2017, Eur. J. Oper. Res..

[21]  Rong Jin,et al.  Learning with Non-Convex Truncated Losses by SGD , 2018, UAI.

[22]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[23]  R. Tyrrell Rockafellar,et al.  A Property of Piecewise Smooth Functions , 2003, Comput. Optim. Appl..

[24]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[25]  Elad Hazan,et al.  Online submodular minimization , 2009, J. Mach. Learn. Res..

[26]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[27]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[28]  Xiaobo Li,et al.  Online Learning with Non-Convex Losses and Non-Stationary Regret , 2018, AISTATS.