Human-AI Learning Performance in Multi-Armed Bandits

People frequently face challenging decision-making problems in which outcomes are uncertain or unknown. Artificial intelligence (AI) algorithms exist that can outperform humans at learning such tasks. Thus, there is an opportunity for AI agents to assist people in learning these tasks more effectively. In this work, we use a multi-armed bandit as a controlled setting in which to explore this direction. We pair humans with a selection of agents and observe how well each human-agent team performs. We find that team performance can beat both human and agent performance in isolation. Interestingly, we also find that an agent's performance in isolation does not necessarily correlate with the human-agent team's performance. A drop in agent performance can lead to a disproportionately large drop in team performance, or in some settings can even improve team performance. Pairing a human with an agent that performs slightly better than them can make them perform much better, while pairing them with an agent that performs the same can make them them perform much worse. Further, our results suggest that people have different exploration strategies and might perform better with agents that match their strategy. Overall, optimizing human-agent team performance requires going beyond optimizing agent performance, to understanding how the agent's suggestions will influence human decision-making.

[1]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[2]  R. Meyer,et al.  Sequential Choice Under Ambiguity: Intuitive Solutions to the Armed-Bandit Problem , 1995 .

[3]  Mark A. Olson,et al.  An experimental analysis of the bandit problem , 1997 .

[4]  Christopher M. Anderson Behavioral models of strategies in multi-armed bandit problems , 2001 .

[5]  C. Burke,et al.  The impact of cross-training on team effectiveness. , 2002, The Journal of applied psychology.

[6]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[7]  Andrea Lockerd Thomaz,et al.  Effects of nonverbal communication on efficiency and robustness in human-robot teamwork , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[10]  Stefanos Nikolaidis,et al.  Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[11]  Paul B. Reverdy,et al.  Human-inspired algorithms for search A framework for human-machine multi-armed bandit problems , 2014 .

[12]  Vaibhav Srivastava,et al.  Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits , 2013, Proceedings of the IEEE.

[13]  Manuel Lopes,et al.  Facilitating intention prediction for humans by optimizing robot motions , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Siddhartha S. Srinivasa,et al.  Effects of Robot Motion on Human-Robot Collaboration , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[15]  Maja J. Mataric,et al.  How Robot Verbal Feedback Can Improve Team Performance in Human-Robot Task Collaborations , 2015, 2015 10th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[16]  Sarvapali D. Ramchurn,et al.  Agile Planning for Real-World Disaster Response , 2015, IJCAI.

[17]  Steven Reece,et al.  Human–agent collaboration for disaster response , 2015, Autonomous Agents and Multi-Agent Systems.

[18]  Anca D. Dragan,et al.  Cooperative Inverse Reinforcement Learning , 2016, NIPS.

[19]  Michael A. Rupp,et al.  Intelligent Agent Transparency in Human–Agent Teaming for Multi-UxV Management , 2016, Hum. Factors.

[20]  Gorjan Alagic,et al.  #p , 2019, Quantum information & computation.

[21]  Siddhartha S. Srinivasa,et al.  Game-Theoretic Modeling of Human Adaptation in Human-Robot Collaboration , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[22]  Anca D. Dragan,et al.  Expressing Robot Incapability , 2018, 2018 13th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[23]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[24]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[25]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[26]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[27]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.