Robust pairwise learning with Huber loss

Abstract Pairwise learning naturally arises from machine learning tasks such as AUC maximization, ranking, and metric learning. In this paper we propose a new pairwise learning algorithm based on the additive noise regression model, which adopts the pairwise Huber loss and applies effectively even to the situation where the noise only satisfies a weak moment condition. Owing to the robustness of Huber loss function, this new method is resistant to heavy-tailed errors or outliers in the response variable. We establish a comparison theorem to characterize the gap between the excess generalization error and the prediction error. We derive the error bounds and convergence rates under appropriate conditions. It is worth mentioning that all the results are established under the ( 1 + ϵ ) -th moment condition of the noise variable. It is rather weak particularly in the case of ϵ 1 , which means the noise variable does not even admit a finite variance.

[1]  Sayan Mukherjee,et al.  Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization , 2006, Adv. Comput. Math..

[2]  Arnout Van Messem,et al.  On consistency and robustness properties of support vector machines for heavy-tailed distributions , 2010 .

[3]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[4]  Tong Zhang,et al.  Leave-One-Out Bounds for Kernel Methods , 2003, Neural Computation.

[5]  Robert Hable,et al.  On qualitative robustness of support vector machines , 2009, J. Multivar. Anal..

[6]  Hong Chen,et al.  The convergence rate of a regularized ranking algorithm , 2012, J. Approx. Theory.

[7]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[8]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[9]  Yunlong Feng,et al.  Learning theory of minimum error entropy under weak moment conditions , 2021 .

[10]  Ding-Xuan Zhou,et al.  Learning Theory: An Approximation Theory Viewpoint , 2007 .

[11]  Yiming Ying,et al.  Online Pairwise Learning Algorithms , 2016, Neural Computation.

[12]  Andreas Christmann,et al.  On the robustness of kernel-based pairwise learning , 2020, ArXiv.

[13]  Yunlong Feng,et al.  Learning under (1 + ϵ)-moment conditions , 2020 .

[14]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[15]  Luoqing Li,et al.  Learning performance of coefficient-based regularized ranking , 2014, Neurocomputing.

[16]  Chungang Yan,et al.  Pairwise Gaussian Loss for Convolutional Neural Networks , 2020, IEEE Transactions on Industrial Informatics.

[17]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[18]  Andreas Christmann,et al.  On the robustness of regularized pairwise learning methods based on kernels , 2015, J. Complex..

[19]  Wei Shen,et al.  L G ] 2 6 A pr 2 01 9 Stability and Optimization Error of Stochastic Gradient Descent for Pairwise Learning , 2019 .

[20]  Lei Shi,et al.  Learning rates for regularized least squares ranking algorithm , 2017 .

[21]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[22]  Ingo Steinwart,et al.  Consistency and robustness of kernel-based regression in convex risk minimization , 2007, 0709.0626.

[23]  P. J. Huber Robust Estimation of a Location Parameter , 1964 .

[24]  Bo Zhang,et al.  Online pairwise learning algorithms with convex loss functions , 2017, Inf. Sci..

[25]  Qiang Sun,et al.  Adaptive Huber Regression , 2017, Journal of the American Statistical Association.

[26]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[27]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[28]  Zhi-Hua Zhou,et al.  On the Consistency of AUC Pairwise Optimization , 2012, IJCAI.

[29]  Andreas Christmann,et al.  On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition , 2004, J. Mach. Learn. Res..

[31]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.