Online Optimization : Competing with Dynamic Comparators

Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop prediction methods that perform well against complex benchmarks. In this paper, we address these two directions together. We present a fully adaptive method that competes with dynamic benchmarks in which regret guarantee scales with regularity of the sequence of cost functions and comparators. Notably, the regret bound adapts to the smaller complexity measure in the problem environment. Finally, we apply our results to drifting zero-sum, two-player games where both players achieve no regret guarantees against best sequences of actions in hindsight.

[1]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[4]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[5]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[6]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[7]  Adam Tauman Kalai,et al.  Logarithmic Regret Algorithms for Online Convex Optimization , 2006, COLT.

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[10]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[11]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[12]  Constantinos Daskalakis,et al.  Near-optimal no-regret algorithms for zero-sum games , 2011, SODA '11.

[13]  Nicolò Cesa-Bianchi,et al.  A new look at shifting regret , 2012, ArXiv.

[14]  Joseph Naor,et al.  Unified Algorithms for Online Learning and Competitive Analysis , 2012, COLT.

[15]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[16]  Rebecca Willett,et al.  Online Optimization in Dynamic Environments , 2013, ArXiv.

[17]  Karthik Sridharan,et al.  Online Learning with Predictable Sequences , 2012, COLT.

[18]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[19]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..