On optimal foraging and multi-armed bandits

We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is uniformly dominated by a logarithmic function of time, and an expected cumulative number of transitions from one arm to another arm uniformly dominated by a double-logarithmic function of time. We observe that the multi-armed bandit problem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature.

[1]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[2]  Sasha R. X. Dall,et al.  Information and its use by animals in evolutionary ecology. , 2005, Trends in ecology & evolution.

[3]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[4]  Erol Gelenbe,et al.  Autonomous search by robots and animals: A survey , 1997, Robotics Auton. Syst..

[5]  Andrew M. Hein,et al.  Sensing and decision-making in random search , 2012, Proceedings of the National Academy of Sciences.

[6]  Paul B. Reverdy Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[7]  Tamar Keasar,et al.  Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[9]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[11]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[12]  H. Stanley,et al.  Optimizing the success of random searches , 1999, Nature.

[13]  P. Taylor,et al.  Test of optimal sampling by foraging great tits , 1978 .

[14]  Graham H. Pyke,et al.  Optimal Foraging: A Selective Review of Theory and Tests , 1977, The Quarterly Review of Biology.

[15]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[16]  H. Stanley,et al.  The Physics of Foraging: An Introduction to Random Searches and Biological Encounters , 2011 .

[17]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[18]  M. Moreau,et al.  Intermittent search strategies , 2011, 1104.0639.

[19]  Robert D. Kleinberg,et al.  Regret bounds for sleeping experts and bandits , 2010, Machine Learning.