Boosting medical diagnostics by pooling independent judgments

Significance Collective intelligence is considered to be one of the most promising approaches to improve decision making. However, up to now, little is known about the conditions underlying the emergence of collective intelligence in real-world contexts. Focusing on two key areas of medical diagnostics (breast and skin cancer detection), we here show that similarity in doctors’ accuracy is a key factor underlying the emergence of collective intelligence in these contexts. This result paves the way for innovative and more effective approaches to decision making in medical diagnostics and beyond, and to the scientific analyses of those approaches. Collective intelligence refers to the ability of groups to outperform individual decision makers when solving complex cognitive problems. Despite its potential to revolutionize decision making in a wide range of domains, including medical, economic, and political decision making, at present, little is known about the conditions underlying collective intelligence in real-world contexts. We here focus on two key areas of medical diagnostics, breast and skin cancer detection. Using a simulation study that draws on large real-world datasets, involving more than 140 doctors making more than 20,000 diagnoses, we investigate when combining the independent judgments of multiple doctors outperforms the best doctor in a group. We find that similarity in diagnostic accuracy is a key condition for collective intelligence: Aggregating the independent judgments of doctors outperforms the best doctor in a group whenever the diagnostic accuracy of doctors is relatively similar, but not when doctors’ diagnostic accuracy differs too much. This intriguingly simple result is highly robust and holds across different group sizes, performance levels of the best doctor, and collective intelligence rules. The enabling role of similarity, in turn, is explained by its systematic effects on the number of correct and incorrect decisions of the best doctor that are overruled by the collective. By identifying a key factor underlying collective intelligence in two important real-world contexts, our findings pave the way for innovative and more effective approaches to complex real-world decision making, and to the scientific analyses of those approaches.

[1]  G. Owen,et al.  Thirteen theorems in search of the truth , 1983 .

[2]  A. Koriat,et al.  When Are Two Heads Better than One and Why? , 2012, Science.

[3]  S BLAS,et al.  The Milgram Paradigm After 35 Years : Some Things We Now Know About Obedience to Authority ' THOMAS , 2006 .

[4]  Kevin Chagin,et al.  The Wisdom of Crowds of Doctors , 2016, Medical decision making : an international journal of the Society for Medical Decision Making.

[5]  C. de Wolf,et al.  Mammographic Screening Programmes in Europe: Organization, Coverage and Participation , 2012, Journal of medical screening.

[6]  D. Sperber,et al.  "Two heads are better" stands to reason. , 2012, Science.

[7]  Peter E. Latham,et al.  Does interaction matter? Testing whether a confidence heuristic can replace interaction in collective decision-making , 2014, Consciousness and Cognition.

[8]  R. Hertwig Tapping into the Wisdom of the Crowd—with Confidence , 2012, Science.

[9]  Thomas Pfeiffer,et al.  Prediction markets and their potential role in biomedical research - A review , 2010, Biosyst..

[10]  Sydney E. Scott,et al.  Psychological Strategies for Winning a Geopolitical Forecasting Tournament , 2014, Psychological science.

[11]  Wolfgang Gaissmaier,et al.  Diagnostic performance by medical students working individually or in teams. , 2015, JAMA.

[12]  Reid Hastie,et al.  The robust beauty of majority rules in group decisions. , 2005, Psychological review.

[13]  Stefan Krause,et al.  Swarm intelligence in animals and humans. , 2010, Trends in ecology & evolution.

[14]  S. Haneuse,et al.  Educational interventions to improve screening mammography interpretation: a randomized controlled trial. , 2014, AJR. American journal of roentgenology.

[15]  M. G. Fleming,et al.  Dermoscopy of pigmented skin lesions: results of a consensus meeting via the Internet. , 2003, Journal of the American Academy of Dermatology.

[16]  P. Latham,et al.  References and Notes Supporting Online Material Materials and Methods Figs. S1 to S11 References Movie S1 Optimally Interacting Minds R�ports , 2022 .

[17]  J. Enns,et al.  When two heads are better than one: Interactive versus independent benefits of collaborative cognition , 2015, Psychonomic bulletin & review.

[18]  J Hilden,et al.  Regret graphs, diagnostic uncertainty and Youden's Index. , 1996, Statistics in medicine.

[19]  Thomas Blass,et al.  The Milgram Paradigm After 35 Years: Some Things We Now Know About Obedience to Authority1 , 1999 .

[20]  Edmund A. Mennis The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations , 2006 .

[21]  A. Pratkanis,et al.  Twenty-Five Years of Groupthink Theory and Research: Lessons from the Evaluation of a Theory. , 1998, Organizational behavior and human decision processes.

[22]  Charles Andel,et al.  The economics of health care quality and medical errors. , 2012, Journal of health care finance.

[23]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[24]  N. Kerr,et al.  Group performance and decision making. , 2004, Annual review of psychology.

[25]  D. Miglioretti,et al.  Association between time spent interpreting, level of confidence, and accuracy of screening mammography. , 2012, AJR. American journal of roentgenology.

[26]  Jens Krause,et al.  Accurate decisions in an uncertain world: collective cognition increases true positives while decreasing false positives , 2013, Proceedings of the Royal Society B: Biological Sciences.

[27]  Marco Dorigo,et al.  Swarm intelligence: from natural to artificial systems , 1999 .

[28]  P. Maurette,et al.  [To err is human: building a safer health system]. , 2002, Annales francaises d'anesthesie et de reanimation.

[29]  A. Wall,et al.  Book ReviewTo Err is Human: building a safer health system Kohn L T Corrigan J M Donaldson M S Washington DC USA: Institute of Medicine/National Academy Press ISBN 0 309 06837 1 $34.95 , 2000 .

[30]  Asher Koriat,et al.  When two heads are better than one and when they can be worse: The amplification hypothesis. , 2015, Journal of experimental psychology. General.

[31]  Paul C. Tetlock,et al.  The Promise of Prediction Markets , 2008, Science.

[32]  D. Meyer,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S6 References Evidence for a Collective Intelligence Factor in the Performance of Human Groups , 2022 .

[33]  Jens Krause,et al.  Detection Accuracy of Collective Intelligence Assessments for Skin Cancer Diagnosis. , 2015, JAMA dermatology.

[34]  K. Eva,et al.  Diagnostic error and clinical reasoning , 2010, Medical education.

[35]  E. S. de Paredes,et al.  Missed breast carcinoma: pitfalls and pearls. , 2003, Radiographics : a review publication of the Radiological Society of North America, Inc.

[36]  J. McGrath Groups: Interaction and Performance , 1984 .

[37]  I. Couzin Collective cognition in animal groups , 2009, Trends in Cognitive Sciences.

[38]  Ralf H. J. M. Kurvers,et al.  Collective Cognition in Humans: Groups Outperform Their Best Members in a Sentence Reconstruction Task , 2013, PloS one.

[39]  Ralf H. J. M. Kurvers,et al.  Collective Intelligence Meets Medical Decision-Making: The Collective Outperforms the Best Radiologist , 2015, PloS one.

[40]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[41]  F. Kee,et al.  Decision Making in a Multidisciplinary Cancer Team: Does Team Discussion Result in Better Quality Decisions? , 2004, Medical decision making : an international journal of the Society for Medical Decision Making.

[42]  D. Troxel,et al.  Pitfalls in the Diagnosis of Malignant Melanoma: Findings of a Risk Management Panel Study , 2003, The American journal of surgical pathology.

[43]  Wolfgang Ziegler,et al.  Swarm Intelligence From Natural To Artificial Systems , 2016 .

[44]  Nicholas Epley,et al.  Group discussion improves lie detection , 2015, Proceedings of the National Academy of Sciences.

[45]  T. Brennan,et al.  The nature of adverse events in hospitalized patients. Results of the Harvard Medical Practice Study II. , 1991, The New England journal of medicine.

[46]  S. Moss,et al.  The calculation of targets for the cancer and adenoma detection rates for the NHS bowel screening programme , 2012, Journal of medical screening.