1 Perturbation Techniques in Online Learning and Optimization

In this chapter we give a new perspective on so-called perturbation methods that have been applied in a number of di erent fields, but in particular for adversarial online learning problems. We show that the classical algorithm known as Follow The Perturbed Leader (FTPL) can be viewed through the lens of stochastic smoothing, a tool that has proven popular within convex optimization. We prove bounds on regret for several online learning settings, and provide generic tools for analyzing perturbation algorithms. We also consider the so-called bandit setting, where the feedback to the learner is significantly constrained, and we show that near-optimal bounds can be achieved as long as a simple condition on the perturbation distribution is met.

[1]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[2]  D. Bertsekas Stochastic optimization problems with nondifferentiable cost functionals , 1973 .

[3]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[4]  Paul Glasserman,et al.  Gradient Estimation Via Perturbation Analysis , 1990 .

[5]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[6]  John Gittins,et al.  Quantitative Methods in the Planning of Pharmaceutical Research , 1996 .

[7]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8]  M. Simon Probability distributions involving Gaussian random variables : a handbook for engineers and scientists , 2002 .

[9]  William H. Sandholm,et al.  ON THE GLOBAL CONVERGENCE OF STOCHASTIC FICTITIOUS PLAY , 2002 .

[10]  I. Molchanov Theory of Random Sets , 2005 .

[11]  Tapio Elomaa,et al.  On Following the Perturbed Leader in the Bandit Setting , 2005, ALT.

[12]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[13]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[14]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[15]  Guy Van den Broeck,et al.  Monte-Carlo Tree Search in Poker Using Expected Reward Distributions , 2009, ACML.

[16]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[17]  A. Nedić,et al.  Convex nondifferentiable stochastic optimization: A local randomized smoothing technique , 2010, Proceedings of the 2010 American Control Conference.

[18]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[19]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.

[20]  Una-May O'Reilly,et al.  Hyperparameter Tuning in Bandit-Based Adaptive Operator Selection , 2012, EvoApplications.

[21]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[22]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[23]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[24]  Ohad Shamir,et al.  Relax and Randomize : From Value to Algorithms , 2012, NIPS.

[25]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[26]  Gergely Neu,et al.  An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.

[27]  Jennifer Wortman Vaughan,et al.  Efficient Market Making via Convex Optimization, and a Connection to Online Learning , 2013, TEAC.

[28]  Luc Devroye,et al.  Prediction by random-walk perturbation , 2013, COLT.

[29]  Ambuj Tewari,et al.  Online Linear Optimization via Smoothing , 2014, COLT.

[30]  Rémi Munos,et al.  Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.

[31]  Wojciech Kotlowski,et al.  Follow the Leader with Dropout Perturbations , 2014, COLT.

[32]  Ambuj Tewari,et al.  Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.

[33]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.