论文信息 - Multi-Armed Bandits for Human-Machine Decision Making

Multi-Armed Bandits for Human-Machine Decision Making

Building an integrated human-machine decision-making system requires developing effective interfaces between the human and the machine. We develop such an interface by studying the multi-armed bandit problem, a simple sequential decision-making paradigm that can model a variety of tasks. We construct Bayesian algorithms for the multi-armed bandit problem, prove conditions under which these algorithms achieve good performance, and empirically show that, with appropriate priors, these algorithms effectively model human choice behavior; the priors then form a principled interface from human to machine. We take a signal processing perspective on the prior estimation problem and develop methods to estimate the priors given human choice data.

Vaibhav Srivastava | Paul B. Reverdy | Paul Reverdy | Vaibhav Srivastava

[1] Jonathan D. Cohen,et al. Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[2] Yi Gai,et al. Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[3] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[4] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[5] Vaibhav Srivastava,et al. Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis , 2015, ArXiv.

[6] Paul B. Reverdy. Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[7] Steven Kay,et al. Fundamentals Of Statistical Signal Processing , 2001 .

[8] Vaibhav Srivastava,et al. Satisficing in Multi-Armed Bandit Problems , 2015, IEEE Transactions on Automatic Control.

[9] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[10] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[11] Schrater Paul. Structure learning in human sequential decision-making , 2009 .

[12] M. Lee,et al. A Bayesian analysis of human decision-making on bandit problems , 2009 .

[13] Naomi Ehrich Leonard,et al. Parameter Estimation in Softmax Decision-Making Models With Linear Objective Functions , 2015, IEEE Transactions on Automation Science and Engineering.