Wisdom of crowds versus groupthink: learning in groups and in isolation

We evaluate the asymptotic performance of boundedly-rational strategies in multi-armed bandit problems, where performance is measured in terms of the tendency (in the limit) to play optimal actions in either (i) isolation or (ii) networks of other learners. We show that, for many strategies commonly employed in economics, psychology, and machine learning, performance in isolation and performance in networks are essentially unrelated. Our results suggest that the performance of various, common boundedly-rational strategies depends crucially upon the social context (if any) in which such strategies are to be employed.

[1]  Martin Posch,et al.  Attainability of boundary points under reinforcement learning , 2005, Games Econ. Behav..

[2]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[3]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[4]  Brian Skyrms,et al.  Emergence of Information Transfer by Inductive Learning , 2008, Stud Logica.

[5]  Kevin J. S. Zollman The Epistemic Benefit of Transient Diversity , 2009 .

[6]  Stanislav Volkov,et al.  Learning to signal: Analysis of a micro-level reinforcement model , 2009 .

[7]  A. Roth,et al.  Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term* , 1995 .

[8]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[9]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[10]  Robin Pemantle,et al.  Network formation by reinforcement learning: the long and medium run , 2004, Math. Soc. Sci..

[11]  Scott E. Page,et al.  Problem Solving by Heterogeneous Agents , 2001, J. Econ. Theory.

[12]  D. M. Kuhlman,et al.  Individual differences in game motivation as moderators of preprogrammed strategy effects in prisoner's dilemma. , 1975, Journal of personality and social psychology.

[13]  Lu Hong,et al.  Groups of diverse problem solvers can outperform groups of high-ability problem solvers. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  E. Hopkins Two Competing Models of How People Learn in Games (first version) , 1999 .

[15]  S. Goyal Learning in Networks , 2002 .

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  M. Cripps,et al.  Strategic Experimentation with Exponential Bandits , 2003 .

[18]  Glenn Ellison,et al.  Rules of Thumb for Social Learning , 1993, Journal of Political Economy.

[19]  J. Baron,et al.  ORGANIZATIONAL BEHAVIOR AND HUMAN DECISION PROCESSES www.elsevier.com/locate/obhdp Omission bias, individual differences, and normality , 2003 .

[20]  Alan W. Beggs,et al.  On the convergence of reinforcement learning , 2005, J. Econ. Theory.

[21]  Keith E. Stanovich,et al.  Individual differences in rational thought. , 1998 .

[22]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[23]  Jürgen Branke,et al.  Simulated annealing in the presence of noise , 2008, J. Heuristics.