论文信息 - The Hedge Algorithm on a Continuum - 字舞流文

The Hedge Algorithm on a Continuum

We consider an online optimization problem on a compact subset S ⊂ Rn (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x(t) over S, and seeks to minimize a cumulative expected loss, Σt=1T Es∼x(t) [l(t)(s)], where l(t) is a Lipschitz loss function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(√tlogt) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S.

Alexandre M. Bayen | Claire J. Tomlin | Walid Krichene | Maximilian Balandat | C. Tomlin | A. Bayen | Walid Krichene | Maximilian Balandat | W. Krichene

[1] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[2] Wouter M. Koolen,et al. Learning the Learning Rate for Prediction with Expert Advice , 2014, NIPS.

[3] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[4] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[5] T. Cover. Universal Portfolios , 1996 .

[6] Yishay Mansour,et al. Regret to the best vs. regret to the average , 2007, Machine Learning.

[7] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[8] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[9] J. Picard,et al. Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[10] Tom Minka,et al. A* Sampling , 2014, NIPS.

[11] George Mavrotas,et al. Multiobjective portfolio optimization with non-convex policy constraints: Evidence from the Eurostoxx 50 , 2014 .

[12] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[13] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[14] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[15] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[16] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[17] Heinz H. Bauschke,et al. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[18] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[19] Jean-Yves Audibert. Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[20] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[21] Sébastien Bubeck,et al. Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[22] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[23] Olivier Catoni,et al. Statistical learning theory and stochastic optimization , 2004 .

[24] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[25] A. Dalalyan,et al. Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[26] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[27] Adam Tauman Kalai,et al. Universal Portfolios With and Without Transaction Costs , 2004, Machine Learning.