How Expert Confidence Can Improve Collective Decision-Making in Contextual Multi-Armed Bandit Problems

In collective decision-making (CDM) a group of experts with a shared set of values and a common goal must combine their knowledge to make a collectively optimal decision. Whereas existing research on CDM primarily focuses on making binary decisions, we focus here on CDM applied to solving contextual multi-armed bandit (CMAB) problems, where the goal is to exploit contextual information to select the best arm among a set. To address the limiting assumptions of prior work, we introduce confidence estimates and propose a novel approach to deciding with expert advice which can take advantage of these estimates. We further show that, when confidence estimates are imperfect, the proposed approach is more robust than the classical confidence-weighted majority vote.

[1]  Ares Lagae,et al.  A Survey of Procedural Noise Functions , 2010, Comput. Graph. Forum.

[2]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[3]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[4]  Jack Bowden,et al.  Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. , 2015, Statistical science : a review journal of the Institute of Mathematical Statistics.

[5]  G. Owen,et al.  Thirteen theorems in search of the truth , 1983 .

[6]  P. Latham,et al.  Confidence matching in group decision-making , 2017, Nature Human Behaviour.

[7]  Loren G. Terveen,et al.  Two peers are better than one: aggregating peer reviews for computing assignments is surprisingly accurate , 2009, GROUP.

[8]  D. Rew,et al.  Collective wisdom and decision making in surgical oncology. , 2010, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[9]  D. Dunning The Dunning–Kruger Effect , 2011 .

[10]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[11]  Nello Cristianini,et al.  Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.

[12]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[13]  Dan Bang,et al.  Making better decisions in groups , 2017, Royal Society Open Science.

[14]  Li Zhou,et al.  A Survey on Contextual Multi-armed Bandits , 2015, ArXiv.

[15]  John Langford,et al.  An Optimal High Probability Algorithm for the Contextual Bandit Problem , 2010, ArXiv.

[16]  Gavin Brown,et al.  Individual Confidence-Weighting and Group Decision-Making. , 2017, Trends in ecology & evolution.