论文信息 - On optimal foraging and multi-armed bandits

On optimal foraging and multi-armed bandits

We consider two variants of the standard multi-armed bandit problem, namely, the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs. We develop block allocation algorithms for these problems that achieve an expected cumulative regret that is uniformly dominated by a logarithmic function of time, and an expected cumulative number of transitions from one arm to another arm uniformly dominated by a double-logarithmic function of time. We observe that the multi-armed bandit problem with transition costs and the associated block allocation algorithm capture the key features of popular animal foraging models in literature.

Vaibhav Srivastava | Naomi Ehrich Leonard | Paul B. Reverdy | Paul Reverdy | Vaibhav Srivastava

[1] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[2] Sasha R. X. Dall,et al. Information and its use by animals in evolutionary ecology. , 2005, Trends in ecology & evolution.

[3] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[4] Erol Gelenbe,et al. Autonomous search by robots and animals: A survey , 1997, Robotics Auton. Syst..

[5] Andrew M. Hein,et al. Sensing and decision-making in random search , 2012, Proceedings of the National Academy of Sciences.

[6] Paul B. Reverdy. Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[7] Tamar Keasar,et al. Bees in two-armed bandit situations: foraging choices and possible decision mechanisms , 2002 .

[9] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[10] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[11] D. Teneketzis,et al. Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[12] H. Stanley,et al. Optimizing the success of random searches , 1999, Nature.

[13] P. Taylor,et al. Test of optimal sampling by foraging great tits , 1978 .

[14] Graham H. Pyke,et al. Optimal Foraging: A Selective Review of Theory and Tests , 1977, The Quarterly Review of Biology.

[15] Vaibhav Srivastava,et al. Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[16] H. Stanley,et al. The Physics of Foraging: An Introduction to Random Searches and Biological Encounters , 2011 .

[17] Wolfram Burgard,et al. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[18] M. Moreau,et al. Intermittent search strategies , 2011, 1104.0639.

[19] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.