Exploration Potential

We introduce exploration potential, a quantity that measures how much a reinforcement learning agent has explored its environment class. In contrast to information gain, exploration potential takes the problem’s reward structure into account. This leads to an exploration criterion that is both necessary and sufficient for asymptotic optimality (learning to act optimally across the entire environment class). Our experiments in multi-armed bandits use exploration potential to illustrate how different algorithms make the tradeoff between exploration and exploitation.

[1]  Tor Lattimore,et al.  Optimally Confident UCB : Improved Regret for Finite-Armed Bandits , 2015, ArXiv.

[2]  Laurent Orseau,et al.  Universal Knowledge-Seeking Agents for Stochastic Environments , 2013, ALT.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Marlos C. Machado,et al.  Learning Purposeful Behaviour in the Absence of Rewards , 2016, ArXiv.

[5]  Jan Leike,et al.  Nonparametric General Reinforcement Learning , 2016, ArXiv.

[6]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[7]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[8]  Gautam Reddy,et al.  Infomax Strategies for an Optimal Balance Between Exploration and Exploitation , 2016, Journal of Statistical Physics.

[9]  Benjamin Van Roy,et al.  Learning to Optimize via Information-Directed Sampling , 2014, NIPS.

[10]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[11]  Peter Auer,et al.  Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[12]  Filip De Turck,et al.  Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[15]  David Andre,et al.  Model based Bayesian Exploration , 1999, UAI.

[16]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.