A Modern Introduction to Online Learning

In this monograph, I introduce the basic concepts of Online Learning through a modern view of Online Convex Optimization. Here, online learning refers to the framework of regret minimization under worst-case assumptions. I present first-order and second-order algorithms for online learning with convex losses, in Euclidean and non-Euclidean settings. All the algorithms are clearly presented as instantiation of Online Mirror Descent or Follow-The-Regularized-Leader and their variants. Particular attention is given to the issue of tuning the parameters of the algorithms and learning in unbounded domains, through adaptive and parameter-free online learning algorithms. Non-convex losses are dealt through convex surrogate losses and through randomization. The bandit setting is also briefly discussed, touching on the problem of adversarial and stochastic multi-armed bandits. These notes do not require prior knowledge of convex analysis and all the required mathematical tools are rigorously explained. Moreover, all the proofs have been carefully chosen to be as simple and as short as possible.

[1]  András György,et al.  A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds , 2017, ALT.

[2]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[3]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: Index , 2007 .

[4]  Vladimir Vovk,et al.  Prediction with Advice of Unknown Number of Experts , 2010, UAI.

[5]  Martin A. Zinkevich,et al.  Theoretical guarantees for algorithms in multi-agent settings , 2004 .

[6]  Ashok Cutkosky,et al.  Combining Online Learning Guarantees , 2019, COLT.

[7]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[8]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Marc Teboulle,et al.  Convergence Analysis of a Proximal-Like Minimization Algorithm Using Bregman Functions , 1993, SIAM J. Optim..

[11]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[12]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Ambuj Tewari,et al.  Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.

[15]  Arkadi Nemirovski,et al.  The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography , 2001, SIAM J. Optim..

[16]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[17]  Francesco Orabona,et al.  Dimension-Free Exponentiated Gradient , 2013, NIPS.

[18]  A. Hoorfar,et al.  INEQUALITIES ON THE LAMBERTW FUNCTION AND HYPERPOWER FUNCTION , 2008 .

[19]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[20]  Claudio Gentile,et al.  The Robustness of the p-Norm Algorithms , 1999, COLT '99.

[21]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[22]  Ambuj Tewari,et al.  Online Linear Optimization via Smoothing , 2014, COLT.

[23]  H. Brendan McMahan,et al.  A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[24]  Francesco Orabona,et al.  Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[25]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[26]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[27]  John Langford,et al.  The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[28]  T. Lai Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[29]  Claudio Gentile,et al.  Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[30]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[31]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[32]  Ambuj Tewari,et al.  Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.

[33]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[34]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[35]  Dale Schuurmans,et al.  General Convergence Results for Linear Discriminant Updates , 1997, COLT.

[36]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[37]  Francesco Orabona,et al.  Scale-free online learning , 2016, Theor. Comput. Sci..

[38]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[40]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[41]  Yoram Singer,et al.  Online Learning Meets Optimization in the Dual , 2006, COLT.

[42]  H. Brendan McMahan,et al.  Minimax Optimal Algorithms for Unconstrained Linear Optimization , 2013, NIPS.

[43]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[44]  V. Vovk Competitive On‐line Statistics , 2001 .

[45]  Wouter M. Koolen,et al.  Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.

[46]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[47]  Y. Singer,et al.  Logarithmic Regret Algorithms for Strongly Convex Repeated Games , 2007 .

[48]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[49]  Tatiana Tommasi,et al.  Training Deep Networks without Learning Rates Through Coin Betting , 2017, NIPS.

[50]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[51]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[52]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[53]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[54]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[55]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[56]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[57]  Wouter M. Koolen,et al.  Adaptive Hedge , 2011, NIPS.

[58]  Yoav Freund,et al.  A Parameter-free Hedging Algorithm , 2009, NIPS.

[59]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[60]  Francesco Orabona,et al.  Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.

[61]  Peter L. Bartlett,et al.  Adaptive Online Gradient Descent , 2007, NIPS.

[62]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[63]  Ashok Cutkosky,et al.  Anytime Online-to-Batch, Optimism and Acceleration , 2019, ICML.

[64]  Francesco Orabona,et al.  Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[65]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[66]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[67]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[68]  Matthew J. Streeter,et al.  Less Regret via Online Conditioning , 2010, ArXiv.

[69]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[70]  Mark W. Schmidt,et al.  A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.

[71]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[72]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[73]  Matthew J. Streeter,et al.  No-Regret Algorithms for Unconstrained Online Convex Optimization , 2012, NIPS.

[74]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[75]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[76]  Gábor Lugosi,et al.  Minimax Policies for Combinatorial Prediction Games , 2011, COLT.

[77]  B. Schölkopf,et al.  Convex Repeated Games and Fenchel Duality , 2007 .

[78]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .