Multi-Armed Bandits for Human-Machine Decision Making

Building an integrated human-machine decision-making system requires developing effective interfaces between the human and the machine. We develop such an interface by studying the multi-armed bandit problem, a simple sequential decision-making paradigm that can model a variety of tasks. We construct Bayesian algorithms for the multi-armed bandit problem, prove conditions under which these algorithms achieve good performance, and empirically show that, with appropriate priors, these algorithms effectively model human choice behavior; the priors then form a principled interface from human to machine. We take a signal processing perspective on the prior estimation problem and develop methods to estimate the priors given human choice data.

[1]  Jonathan D. Cohen,et al.  Humans use directed and random exploration to solve the explore-exploit dilemma. , 2014, Journal of experimental psychology. General.

[2]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[3]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[4]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[5]  Vaibhav Srivastava,et al.  Correlated Multiarmed Bandit Problem: Bayesian Algorithms and Regret Analysis , 2015, ArXiv.

[6]  Paul B. Reverdy Modeling Human Decision-making in Multi-armed Bandits , 2013 .

[7]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[8]  Vaibhav Srivastava,et al.  Satisficing in Multi-Armed Bandit Problems , 2015, IEEE Transactions on Automatic Control.

[9]  Aurélien Garivier,et al.  On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.

[10]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[11]  Schrater Paul Structure learning in human sequential decision-making , 2009 .

[12]  M. Lee,et al.  A Bayesian analysis of human decision-making on bandit problems , 2009 .

[13]  Naomi Ehrich Leonard,et al.  Parameter Estimation in Softmax Decision-Making Models With Linear Objective Functions , 2015, IEEE Transactions on Automation Science and Engineering.