On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions

In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We present a generic decoupling technique that enables us to provide Rademacher complexity-based generalization error bounds. Our bounds are in general tighter than those obtained by Wang et al. (2012) for the same problem. Using our decoupling technique, we are further able to obtain fast convergence rates for strongly convex pairwise loss functions. We are also able to analyze a class of memory efficient online learning algorithms for pairwise learning problems that use only a bounded subset of past training samples to update the hypothesis at each step. Finally, in order to complement our generalization bounds, we propose a novel memory efficient online learning algorithm for higher order learning problems with bounded regret guarantees.

[1]  Rong Jin,et al.  Online AUC Maximization , 2011, ICML.

[2]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[3]  Roni Khardon,et al.  Online Learning with Pairwise Loss Functions , 2013, ArXiv.

[4]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[5]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[6]  Koray Kavukcuoglu,et al.  A Binary Classification Framework for Two-Stage Multiple Kernel Learning , 2012, ICML.

[7]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[8]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[9]  R. Serfling Probability Inequalities for the Sum in Sampling without Replacement , 1974 .

[10]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[11]  Nathan Srebro,et al.  Fast Rates for Regularized Objectives , 2008, NIPS.

[12]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[13]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[14]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[15]  Ulf Brefeld,et al.  {AUC} maximizing support vector learning , 2005 .

[16]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[17]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[18]  Roni Khardon,et al.  Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions , 2012, COLT.

[19]  Mehryar Mohri,et al.  Generalization Bounds for Learning Kernels , 2010, ICML.

[20]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[21]  Claudio Gentile,et al.  Improved Risk Tail Bounds for On-Line Algorithms , 2005, IEEE Transactions on Information Theory.

[22]  Ambuj Tewari,et al.  On the Generalization Ability of Online Strongly Convex Programming Algorithms , 2008, NIPS.

[23]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[24]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[25]  Shivani Agarwal,et al.  Generalization Bounds for Ranking Algorithms via Algorithmic Stability , 2009, J. Mach. Learn. Res..

[26]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[27]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[28]  D. Freedman On Tail Probabilities for Martingales , 1975 .