Crowdsourcing in biomedicine: challenges and opportunities

The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.

[1]  K Fidelis,et al.  A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[2]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[3]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[4]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[5]  Laura A. Dabbish,et al.  Designing games with a purpose , 2008, CACM.

[6]  Ellen M. Voorhees,et al.  TREC genomics special issue overview , 2009, Information Retrieval.

[7]  Zhiyong Lu,et al.  Viewpoint Paper: Evaluating Relevance Ranking Strategies for MEDLINE Retrieval , 2009, J. Am. Medical Informatics Assoc..

[8]  Zhiyong Lu,et al.  Understanding PubMed® user search behavior through log analysis , 2009, Database J. Biol. Databases Curation.

[9]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[10]  Xiaohua Hu,et al.  Can clinicians create high-quality databases: a study on a flexible electronic health record (fEHR) system , 2010, IHI.

[11]  Fei Xia,et al.  Preliminary Experiments with Amazon’s Mechanical Turk for Annotating Medical Named Entities , 2010, Mturk@HLT-NAACL.

[12]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[13]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[14]  Junichi Tsujii,et al.  Overview of BioNLP Shared Task 2011 , 2011, BioNLP@ACL.

[15]  Burak Dura,et al.  Design, engineering and utility of biotic games. , 2011, Lab on a chip.

[16]  Zhiyong Lu,et al.  A context-blocks model for identifying clinical relationships in patient records , 2011, BMC Bioinformatics.

[17]  Benjamin M. Good,et al.  Mining the Gene Wiki for functional genomic knowledge , 2011, BMC Genomics.

[18]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[19]  Zhiyong Lu,et al.  Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction , 2011, J. Biomed. Informatics.

[20]  Olivier Bodenreider,et al.  Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature , 2011, Bioinform..

[21]  Sonu Kumar,et al.  The G protein-coupled receptors in the pufferfish Takifugu rubripes , 2011, BMC Bioinformatics.

[22]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[23]  Z. Popovic,et al.  Crystal structure of a monomeric retroviral protease solved by protein folding game players , 2011, Nature Structural &Molecular Biology.

[24]  Benjamin M. Good,et al.  Games with a scientific purpose , 2011, Genome Biology.

[25]  Zhiyong Lu,et al.  Recommending MeSH terms for annotating biomedical articles , 2011, J. Am. Medical Informatics Assoc..

[26]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[27]  Denis Bertrand,et al.  Evolution of orthologous tandemly arrayed gene clusters , 2011, BMC Bioinformatics.

[28]  C. Friedman,et al.  A drug-adverse event extraction algorithm to support pharmacovigilance knowledge mining from PubMed citations. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  Adam Wright,et al.  Development and evaluation of a crowdsourcing methodology for knowledge base construction: identifying relationships between clinical problems and medications , 2012, J. Am. Medical Informatics Assoc..

[30]  M. Blanchette,et al.  Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment , 2012, PloS one.

[31]  Zhiyong Lu,et al.  BioCreative-2012 Virtual Issue , 2012, Database J. Biol. Databases Curation.

[32]  Zhiyong Lu,et al.  SR4GN: A Species Recognition Software Tool for Gene Normalization , 2012, PloS one.

[33]  Fernando González-Ladrón-de-Guevara,et al.  Towards an integrated crowdsourcing definition , 2012, J. Inf. Sci..

[34]  Zhiyong Lu,et al.  Biocuration workflows and text mining: overview of the BioCreative 2012 Workshop Track II , 2012, Database J. Biol. Databases Curation.

[35]  Zhiyong Lu,et al.  Accelerating literature curation with text-mining tools: a case study of using PubTator to curate genes in PubMed abstracts , 2012, Database J. Biol. Databases Curation.

[36]  Lynette Hirschman,et al.  Validating Candidate Gene-Mutation Relations in MEDLINE Abstracts via Crowdsourcing , 2012, DILS.

[37]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[38]  Robert D. Finn,et al.  Making your database available through Wikipedia: the pros and cons , 2011, Nucleic Acids Res..

[39]  R. Altman,et al.  Data-Driven Prediction of Drug Effects and Interactions , 2012, Science Translational Medicine.

[40]  Zhiyong Lu,et al.  An improved corpus of disease mentions in PubMed citations , 2012, BioNLP@HLT-NAACL.

[41]  Jeffrey Heer,et al.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach , 2013, J. Am. Medical Informatics Assoc..

[42]  Benjamin M. Good,et al.  Crowdsourcing for bioinformatics , 2013, Bioinform..

[43]  Ritu Khare,et al.  Understanding the EMR error control practices among gynecologic physicians , 2013 .

[44]  Kathy L. MacLaughlin,et al.  Formative evaluation of the accuracy of a clinical decision support system for cervical cancer screening , 2013, Journal of the American Medical Informatics Association : JAMIA.

[45]  C. Parvanta,et al.  Crowdsourcing 101: A Few Basics to Make You the Leader of the Pack , 2013 .

[46]  Zhiyong Lu,et al.  DNorm: disease name normalization with pairwise learning to rank , 2013, Bioinform..

[47]  Christoph U. Lehmann,et al.  A Crowdsourcing Model for Creating Preclinical Medical Education Study Tools , 2013, Academic medicine : journal of the Association of American Medical Colleges.

[48]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[49]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[50]  Benjamin M. Good,et al.  Correction: Dizeez: An Online Game for Human Gene-Disease Annotation , 2013, PLoS ONE.

[51]  Todd Lingren,et al.  Web 2.0-Based Crowdsourcing for High-Quality Gold Standard Development in Clinical Natural Language Processing , 2013, Journal of medical Internet research.

[52]  Adam A. Margolin,et al.  Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer , 2013, Science Translational Medicine.

[53]  Ryen W. White,et al.  Web-scale pharmacovigilance: listening to signals from the crowd , 2013, J. Am. Medical Informatics Assoc..

[54]  Daniel Perry,et al.  Human centered game design for bioinformatics and cyberinfrastructure learning , 2013, XSEDE.

[55]  D. Madigan,et al.  Medication-Wide Association Studies , 2013, CPT: pharmacometrics & systems pharmacology.

[56]  Jeyakumar Natarajan,et al.  An overview of the BioCreative 2012 Workshop Track III: interactive text mining task , 2013, Database J. Biol. Databases Curation.

[57]  Eric Lonstein,et al.  Prize-based contests can provide solutions to computational biology problems , 2013, Nature Biotechnology.

[58]  Bei Yu,et al.  Crowdsourcing Participatory Evaluation of Medical Pictograms Using Amazon Mechanical Turk , 2013, Journal of medical Internet research.

[59]  William DuMouchel,et al.  Empirical bayes model to combine signals of adverse drug reactions , 2013, KDD.

[60]  Zhiyong Lu,et al.  NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm , 2013, CLEF.

[61]  Jari Björne,et al.  Large-Scale Event Extraction from Literature with Multi-Level Gene Normalization , 2013, PloS one.

[62]  Christopher C. Yang,et al.  Postmarketing Drug Safety Surveillance Using Publicly Available Health-Consumer-Contributed Content in Social Media , 2014, TMIS.

[63]  Rick A Adams,et al.  Crowdsourcing for Cognitive Science – The Utility of Smartphones , 2014, PloS one.

[64]  Taha A. Kass-Hout,et al.  Digital Drug Safety Surveillance: Monitoring Pharmaceutical Products in Twitter , 2014, Drug Safety.

[65]  Zhiyong Lu,et al.  Accessing biomedical literature in the current information landscape. , 2014, Methods in molecular biology.

[66]  C. Morton Innovating Openly: Researchers and Patients Turn to Crowdsourcing to Collaborate on Clinical Trials, Drug Discovery, and More , 2014, IEEE Pulse.

[67]  Minjae Lee,et al.  RNA design rules from a massive open laboratory , 2014, Proceedings of the National Academy of Sciences.

[68]  Zhiyong Lu,et al.  BC4GO: a full-text corpus for the BioCreative IV GO task , 2014, Database J. Biol. Databases Curation.

[69]  D. Allison,et al.  Using Crowdsourcing to Evaluate Published Scientific Literature: Methods and Example , 2014, PloS one.

[70]  Zhiyong Lu,et al.  Hybrid curation of gene–mutation relations combining automated extraction and crowdsourcing , 2014, Database J. Biol. Databases Curation.

[71]  Zhiyong Lu,et al.  NCBI disease corpus: A resource for disease name recognition and concept normalization , 2014, J. Biomed. Informatics.

[72]  M. Stegman Immune attack players perform better on a test of cellular immunology and self confidence than their classmates who play a control video game. , 2014, Faraday discussions.

[73]  Andrey Rzhetsky,et al.  Quantifying the Impact and Extent of Undocumented Biomedical Synonymy , 2014, PLoS Comput. Biol..

[74]  Zhiyong Lu,et al.  LabeledIn: Cataloging labeled indications for human drugs , 2014, J. Biomed. Informatics.

[75]  Melanie Swan,et al.  Big Desire to Share Big Health Data: A Shift in Consumer Attitudes toward Personal Health Information , 2014, AAAI Spring Symposia.

[76]  Christa R. Nevin,et al.  Gamification as a tool for enhancing graduate medical education , 2014, Postgraduate Medical Journal.

[77]  Adam Liwo,et al.  WeFold: A coopetition for protein structure prediction , 2014, Proteins.

[78]  Zhiyong Lu,et al.  BioCreative-IV virtual issue , 2014, Database J. Biol. Databases Curation.

[79]  D. Ingram,et al.  Seasonal trends in sleep-disordered breathing: evidence from Internet search engine query data , 2015, Sleep and Breathing.

[80]  A. Leiter,et al.  Use of crowdsourcing for cancer clinical trial development. , 2014, Journal of the National Cancer Institute.

[81]  Adrien Treuille,et al.  Scientific rigor through videogames. , 2014, Trends in biochemical sciences.

[82]  Jérôme Waldispühl,et al.  Crowdsourcing RNA structural alignments with an online computer game , 2014 .

[83]  Paulo Blikstein,et al.  Biotic games and cloud experimentation as novel media for biophysics education , 2014 .

[84]  Ozgur M. Araz,et al.  Using Google Flu Trends data in forecasting influenza-like-illness related ED visits in Omaha, Nebraska. , 2014, The American journal of emergency medicine.

[85]  Jacques Ophoff,et al.  Understanding What Motivates Participation on Crowdsourcing Platforms , 2014 .

[86]  Alison Callahan,et al.  Analyzing Search Behavior of Healthcare Professionals for Drug Safety Surveillance , 2015, Pacific Symposium on Biocomputing.

[87]  Zhiyong Lu,et al.  Scaling drug indication curation through crowdsourcing , 2015, Database J. Biol. Databases Curation.

[88]  Xian Jin,et al.  Stereoselective virtual screening of the ZINC database using atom pair 3D-fingerprints , 2015, Journal of Cheminformatics.

[89]  Mark A Musen,et al.  Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT , 2015, J. Am. Medical Informatics Assoc..

[90]  Benjamin M. Good,et al.  Microtask Crowdsourcing for Disease Mention Annotation in PubMed Abstracts , 2014, Pacific Symposium on Biocomputing.

[91]  Elina Helander,et al.  The use of crowdsourcing for dietary self-monitoring: crowdsourced ratings of food pictures are comparable to ratings by trained observers , 2015, J. Am. Medical Informatics Assoc..

[92]  Zhiyong Lu,et al.  tmChem: a high performance approach for chemical named entity recognition and normalization , 2015, Journal of Cheminformatics.

[93]  Zhiyong Lu,et al.  Community challenges in biomedical text mining over 10 years: success, failure and the future , 2016, Briefings Bioinform..