论文信息 - Human-AI Collaboration with Bandit Feedback

Human-AI Collaboration with Bandit Feedback

Human-machine complementarity is important when neither the algorithm nor the human yield dominant performance across all instances in a given domain. Most research on algorithmic decision-making solely centers on the algorithm's performance, while recent work that explores human-machine collaboration has framed the decision-making problems as classification tasks. In this paper, we first propose and then develop a solution for a novel human-machine collaboration problem in a bandit feedback setting. Our solution aims to exploit the human-machine complementarity to maximize decision rewards. We then extend our approach to settings with multiple human decision makers. We demonstrate the effectiveness of our proposed methods using both synthetic and real human responses, and find that our methods outperform both the algorithm and the human when they each make decisions on their own. We also show how personalized routing in the presence of multiple human decision-makers can further improve the human-machine team performance.

[1] M. Gomez-Rodriguez,et al. Regression Under Human Assistance , 2019, AAAI.

[2] Toniann Pitassi,et al. Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer , 2017, NeurIPS.

[3] Joshua D. Angrist,et al. Identification of Causal Effects Using Instrumental Variables , 1993 .

[4] Zhi-Hua Zhou,et al. Cost-Effective Active Learning from Diverse Labelers , 2017, IJCAI.

[5] J. Pearl. Detecting Latent Heterogeneity , 2013, Probabilistic and Causal Inference.

[6] Daniel S. Weld,et al. Optimizing AI for Teamwork , 2020, ArXiv.

[7] S. Crawford,et al. Volume 1 , 2012, Journal of Diabetes Investigation.

[8] Alexandra Chouldechova,et al. A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores , 2020, CHI.

[9] Tong Wang,et al. Augmented Fairness: An Interpretable Model Augmenting Decision-Makers' Fairness , 2020, ArXiv.

[10] Eric Horvitz,et al. Learning to Complement Humans , 2020, IJCAI.

[11] Matthew Lease,et al. Combining Crowd and Expert Labels Using Decision Theoretic Active Learning , 2015, HCOMP.

[12] B. Ripley,et al. Pattern Recognition , 1968, Nature.

[13] Yiling Chen,et al. Algorithmic Risk Assessments Can Alter Human Decision-Making Processes in High-Stakes Government Contexts , 2020, Proc. ACM Hum. Comput. Interact..

[14] Shao-Yuan Li,et al. Multi-Label Learning from Crowds , 2019, IEEE Transactions on Knowledge and Data Engineering.

[15] Jennifer L. Doleac,et al. Algorithmic Risk Assessment in the Hands of Humans , 2019, SSRN Electronic Journal.

[16] Beat Ernst,et al. Drug discovery today. , 2003, Current topics in medicinal chemistry.

[17] Suchi Saria,et al. A Bayesian Nonparametic Approach for Estimating Individualized Treatment-Response Curves , 2016, ArXiv.

[18] Thomas Blaschke,et al. The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[19] Jon M. Kleinberg,et al. The Algorithmic Automation Problem: Prediction, Triage, and Human Effort , 2019, ArXiv.

[20] Julian Zimmert,et al. Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] David Sontag,et al. Consistent Estimators for Learning to Defer to an Expert , 2020, ICML.

[23] P. Rosenbaum. Model-Based Direct Adjustment , 1987 .

[24] D. Ensign. Decision making with limited feedback: Error bounds for predictive policing and recidivism , 2018 .

[25] Thorsten Joachims,et al. Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[26] Alexandra Chouldechova,et al. Learning under selective labels in the presence of expert consistency , 2018, ArXiv.

[27] M. de Rijke,et al. Deep Learning with Logged Bandit Feedback , 2018, ICLR.

[28] Thorsten Joachims,et al. The Self-Normalized Estimator for Counterfactual Learning , 2015, NIPS.

[29] Dimitris Bertsimas,et al. The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization , 2016, 1605.02347.