Temporal Variability in Implicit Online Learning

In the setting of online learning, Implicit algorithms turn out to be highly successful from a practical standpoint. However, the tightest regret analyses only show marginal improvements over Online Mirror Descent. In this work, we shed light on this behavior carrying out a careful regret analysis. We prove a novel static regret bound that depends on the temporal variability of the sequence of loss functions, a quantity which is often encountered when considering dynamic competitors. We show, for example, that the regret can be constant if the temporal variability is constant and the learning rate is tuned appropriately, without the need of smooth losses. Moreover, we present an adaptive algorithm that achieves this regret bound without prior knowledge of the temporal variability and prove a matching lower bound. Finally, we validate our theoretical findings on classification and regression datasets.

[1]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[2]  Edoardo M. Airoldi,et al.  Statistical analysis of stochastic gradient methods for generalized linear models , 2014, ICML.

[3]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[4]  Matthew J. Streeter,et al.  Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[5]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[6]  Jinfeng Yi,et al.  Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient , 2016, ICML.

[7]  Dale Schuurmans,et al.  implicit Online Learning with Kernels , 2006, NIPS.

[8]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[9]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  H. Brendan McMahan,et al.  A Unified View of Regularized Dual Averaging and Mirror Descent with Implicit Updates , 2010, 1009.3240.

[12]  John C. Duchi,et al.  Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity , 2018, SIAM J. Optim..

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Yu-Xiang Wang,et al.  Non-stationary Stochastic Optimization under L p,q -Variation Measures , 2018 .

[15]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[16]  Tong Zhang,et al.  Fully Implicit Online Learning , 2018, ArXiv.

[17]  Francesco Orabona,et al.  Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[18]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[19]  Francesco Orabona A Modern Introduction to Online Learning , 2019, ArXiv.

[20]  Adam Wierman,et al.  Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent , 2018, COLT.

[21]  Francesco Orabona,et al.  Scale-free online learning , 2016, Theor. Comput. Sci..

[22]  Percy Liang,et al.  Adaptivity and Optimism: An Improved Exponentiated Gradient Algorithm , 2014, ICML.

[23]  Yu-Xiang Wang,et al.  Online Forecasting of Total-Variation-bounded Sequences , 2019, NeurIPS.

[24]  Ketan Rajawat,et al.  Online Learning With Inexact Proximal Online Gradient Descent Algorithms , 2018, IEEE Transactions on Signal Processing.

[25]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[26]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[27]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[28]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[29]  E. Airoldi,et al.  Asymptotic and finite-sample properties of estimators based on stochastic gradients , 2014 .

[30]  B. Martinet,et al.  R'egularisation d''in'equations variationnelles par approximations successives , 1970 .

[31]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[32]  Emiliano Dall'Anese,et al.  Inexact Online Proximal-gradient Method for Time-varying Convex Optimization , 2020, 2020 American Control Conference (ACC).

[33]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[34]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[35]  Rong Jin,et al.  Dynamic Regret of Strongly Adaptive Methods , 2017, ICML.

[36]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[37]  Peter L. Bartlett,et al.  Implicit Online Learning , 2010, ICML.