When Humans and Machines Make Joint Decisions: A Non-Symmetric Bandit Model

How can humans and machines learn to make joint decisions? This has become an important question in domains such as medicine, law and finance. We approach the question from a theoretical perspective and formalize our intuitions about human-machine decision making in a non-symmetric bandit model. In doing so, we follow the example of a doctor who is assisted by a computer program. We show that in our model, exploration is generally hard. In particular, unless one is willing to make assumptions about how human and machine interact, the machine cannot explore efficiently. We highlight one such assumption, policy space independence, which resolves the coordination problem and allows both players to explore independently. Our results shed light on the fundamental difficulties faced by the interaction of humans and machines. We also discuss practical implications for the design of algorithmic decision systems.

[1]  Anna Goldenberg,et al.  What Clinicians Want: Contextualizing Explainable Machine Learning for Clinical End Use , 2019, MLHC.

[2]  Adish Singla,et al.  Learning to Switch Between Machines and Humans , 2020, ArXiv.

[3]  Joelle Pineau,et al.  When AIs Outperform Doctors: Confronting the Challenges of a Tort-Induced Over-Reliance on Machine Learning , 2019 .

[4]  Jon M. Kleinberg,et al.  Direct Uncertainty Prediction for Medical Second Opinions , 2018, ICML.

[5]  Michael Brudno,et al.  Prediction of Cardiac Arrest from Physiological Signals in the Pediatric ICU , 2018, MLHC.

[6]  Yevgeny Seldin A Lower Bound for Multi-Armed Bandits with Expert Advice , 2016 .

[7]  Jon M. Kleinberg,et al.  The Algorithmic Automation Problem: Prediction, Triage, and Human Effort , 2019, ArXiv.

[8]  John Langford,et al.  Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.

[9]  Berkeley J. Dietvorst,et al.  Algorithm Aversion: People Erroneously Avoid Algorithms after Seeing Them Err , 2014, Journal of experimental psychology. General.

[10]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[11]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Philipp Berens,et al.  On the ethics of algorithmic decision-making in healthcare , 2019, Journal of Medical Ethics.

[14]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[15]  Jure Leskovec,et al.  The Selective Labels Problem: Evaluating Algorithmic Predictions in the Presence of Unobservables , 2017, KDD.

[16]  Eric Horvitz,et al.  Learning to Complement Humans , 2020, IJCAI.

[17]  Anca D. Dragan,et al.  On the Utility of Learning about Humans for Human-AI Coordination , 2019, NeurIPS.

[18]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[19]  Sabina Leonelli,et al.  Scientific research and big data , 2020 .

[20]  Gergely Neu,et al.  Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.