"Dr. Detective": combining gamification techniques and crowdsourcing to create a gold standard in medical text

This paper proposes a design for a gamified crowdsourcing workflow to extract annotation from medical text. Developed in the context of a general crowdsourcing platform, Dr. Detective is a game with a purpose that engages medical experts into solving annotation tasks on medical case reports, tailored to capture disagreement between annotators. It incorporates incentives such as learning features, to motivate a continuous involvement of the expert crowd. The game was designed to identify expressions valuable for training NLP tools, and interpret their relation in the context of medical diagnosing. In this way, we can resolve the main problem in gathering ground truth from experts - that the low inter-annotator agreement is typically caused by different interpretations of the text. We report on the results of a pilot study assessing the usefulness of this game. The results show that the quality of the annotations by the expert crowd are comparable to those of an NLP parser. Furthermore, we observed that allowing game users to access each others' answers increases agreement between annotators.

[1]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[2]  Lora Aroyo,et al.  Harnessing Disagreement for Event Semantics , 2012, DeRiVE@ISWC.

[3]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[4]  Charles Safran,et al.  Validation of clinical problems using a UMLS-based semantic parser , 1998, AMIA.

[5]  Marco Zamarian,et al.  Analyzing Crowd Labor and Designing Incentives for Humans in the Loop , 2012, IEEE Internet Computing.

[6]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[7]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[8]  Chris Welty,et al.  Crowd Truth: Harnessing disagreement in crowdsourcing a relation extraction gold standard , 2013 .

[9]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[10]  John L. Sherry Flow and Media Enjoyment , 2004 .

[11]  Siddharth Patwardhan,et al.  Structured data and inference in DeepQA , 2012, IBM J. Res. Dev..

[12]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[13]  Marta Sabou,et al.  Leveraging the Wisdom of the Crowds for the Acquisition of Multilingual Language Resources , 2012, LREC.

[14]  Udo Kruschwitz,et al.  Phrase Detectives: A Web-based collaborative annotation game , 2008 .

[15]  L. G. Doak,et al.  Teaching Patients With Low Literacy Skills , 1985 .

[16]  Lora Aroyo,et al.  Exploiting Crowdsourcing Disagreement with Various Domain-Independent Quality Measures , 2013 .

[17]  Martin Hepp,et al.  Games with a Purpose for the Semantic Web , 2008, IEEE Intelligent Systems.

[18]  Lora Aroyo,et al.  Measuring Crowd Truth for Medical Relation Extraction , 2013, AAAI Fall Symposia.

[19]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.