Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text

Background The prevalence and value of patient-generated health text are increasing, but processing such text remains problematic. Although existing biomedical natural language processing (NLP) tools are appealing, most were developed to process clinician- or researcher-generated text, such as clinical notes or journal articles. In addition to being constructed for different types of text, other challenges of using existing NLP include constantly changing technologies, source vocabularies, and characteristics of text. These continuously evolving challenges warrant the need for applying low-cost systematic assessment. However, the primarily accepted evaluation method in NLP, manual annotation, requires tremendous effort and time. Objective The primary objective of this study is to explore an alternative approach—using low-cost, automated methods to detect failures (eg, incorrect boundaries, missed terms, mismapped concepts) when processing patient-generated text with existing biomedical NLP tools. We first characterize common failures that NLP tools can make in processing online community text. We then demonstrate the feasibility of our automated approach in detecting these common failures using one of the most popular biomedical NLP tools, MetaMap. Methods Using 9657 posts from an online cancer community, we explored our automated failure detection approach in two steps: (1) to characterize the failure types, we first manually reviewed MetaMap’s commonly occurring failures, grouped the inaccurate mappings into failure types, and then identified causes of the failures through iterative rounds of manual review using open coding, and (2) to automatically detect these failure types, we then explored combinations of existing NLP techniques and dictionary-based matching for each failure cause. Finally, we manually evaluated the automatically detected failures. Results From our manual review, we characterized three types of failure: (1) boundary failures, (2) missed term failures, and (3) word ambiguity failures. Within these three failure types, we discovered 12 causes of inaccurate mappings of concepts. We used automated methods to detect almost half of 383,572 MetaMap’s mappings as problematic. Word sense ambiguity failure was the most widely occurring, comprising 82.22% of failures. Boundary failure was the second most frequent, amounting to 15.90% of failures, while missed term failures were the least common, making up 1.88% of failures. The automated failure detection achieved precision, recall, accuracy, and F1 score of 83.00%, 92.57%, 88.17%, and 87.52%, respectively. Conclusions We illustrate the challenges of processing patient-generated online health community text and characterize failures of NLP tools on this patient-generated health text, demonstrating the feasibility of our low-cost approach to automatically detect those failures. Our approach shows the potential for scalable and effective solutions to automatically assess the constantly evolving NLP tools and source vocabularies to process patient-generated text.

[1]  Gondy Leroy,et al.  Research Paper: Consumer Health Concepts That Do Not Map to the UMLS: Where Do They Fit? , 2008, J. Am. Medical Informatics Assoc..

[2]  Carolyn Penstein Rosé,et al.  Understanding participant behavior trajectories in online health support groups using automatic extraction methods , 2012, GROUP '12.

[3]  James Geller,et al.  Analysis of a Study of the Users, Uses, and Future Agenda of the UMLS , 2007, J. Am. Medical Informatics Assoc..

[4]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[5]  Alexa T. McCray,et al.  Application of Technology: Design and Implementation of a National Clinical Trials Registry , 2000, J. Am. Medical Informatics Assoc..

[6]  Jina Huh,et al.  Text classification for assisting moderators in online health communities , 2013, J. Biomed. Informatics.

[7]  G. Eysenbach Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and Openness , 2008, Journal of medical Internet research.

[8]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[9]  Jeffrey Heer,et al.  Forum77: An Analysis of an Online Health Forum Dedicated to Addiction Recovery , 2015, CSCW.

[10]  Jeffrey Heer,et al.  Identifying medical terms in patient-authored text: a crowdsourcing-based approach , 2013, J. Am. Medical Informatics Assoc..

[11]  Mark Dredze,et al.  How Social Media Will Change Public Health , 2012, IEEE Intelligent Systems.

[12]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[13]  Qing Zeng-Treitler,et al.  Exploring and developing consumer health vocabularies. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[14]  J. Frost,et al.  Social Uses of Personal Health Information Within PatientsLikeMe, an Online Patient Community: What Can Happen When Patients Have Access to One Another’s Data , 2008, Journal of medical Internet research.

[15]  Dietrich Rebholz-Schuhmann,et al.  Ontology refinement for improved information retrieval , 2010, Inf. Process. Manag..

[16]  Alan R. Aronson,et al.  Towards linking patients and clinical information: detecting UMLS concepts in e-mail , 2003, J. Biomed. Informatics.

[17]  Betsy L. Humphreys,et al.  Technical Milestone: The Unified Medical Language System: An Informatics Research Collaboration , 1998, J. Am. Medical Informatics Assoc..

[18]  M. Stratton,et al.  The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website , 2004, British Journal of Cancer.

[19]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..

[20]  J. Frost,et al.  Sharing Health Data for Better Outcomes on PatientsLikeMe , 2010, Journal of medical Internet research.

[21]  M. Massagli,et al.  Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm , 2011, Nature Biotechnology.

[22]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[23]  Leysia Palen,et al.  "Voluntweeters": self-organizing by digital volunteers in times of crisis , 2011, CHI.

[24]  Erik M. van Mulligen,et al.  Using rule-based natural language processing to improve disease normalization in biomedical text , 2012, J. Am. Medical Informatics Assoc..

[25]  John M. Levine,et al.  To stay or leave?: the relationship of emotional and informational support to commitment in online health support groups , 2012, CSCW.

[26]  Helen L. Johnson,et al.  Concept recognition for extracting protein interaction relations from biomedical text , 2008, Genome Biology.

[27]  George Hripcsak,et al.  Natural language processing in an operational clinical information system , 1995, Natural Language Engineering.

[28]  Christopher G. Chute,et al.  The National Center for Biomedical Ontology , 2012, J. Am. Medical Informatics Assoc..

[29]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[30]  Qing Zeng-Treitler,et al.  Research Paper: Assisting Consumer Health Information Retrieval with Query Recommendations , 2006, J. Am. Medical Informatics Assoc..