The Hedge Algorithm on a Continuum

We consider an online optimization problem on a compact subset S ⊂ Rn (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x(t) over S, and seeks to minimize a cumulative expected loss, Σt=1T Es∼x(t) [l(t)(s)], where l(t) is a Lipschitz loss function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(√tlogt) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S.

[1]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[2]  Wouter M. Koolen,et al.  Learning the Learning Rate for Prediction with Expert Advice , 2014, NIPS.

[3]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[4]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[5]  T. Cover Universal Portfolios , 1996 .

[6]  Yishay Mansour,et al.  Regret to the best vs. regret to the average , 2007, Machine Learning.

[7]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[8]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[9]  J. Picard,et al.  Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[10]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[11]  George Mavrotas,et al.  Multiobjective portfolio optimization with non-convex policy constraints: Evidence from the Eurostoxx 50 , 2014 .

[12]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[14]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[15]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[16]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[17]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[18]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[19]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[20]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[21]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[22]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[23]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[24]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[25]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[26]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[27]  Adam Tauman Kalai,et al.  Universal Portfolios With and Without Transaction Costs , 2004, Machine Learning.