Evaluating the state of the art in coreference resolution for electronic medical records

BACKGROUND The fifth i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records conducted a systematic review on resolution of noun phrase coreference in medical records. Informatics for Integrating Biology and the Bedside (i2b2) and the Veterans Affair (VA) Consortium for Healthcare Informatics Research (CHIR) partnered to organize the coreference challenge. They provided the research community with two corpora of medical records for the development and evaluation of the coreference resolution systems. These corpora contained various record types (ie, discharge summaries, pathology reports) from multiple institutions. METHODS The coreference challenge provided the community with two annotated ground truth corpora and evaluated systems on coreference resolution in two ways: first, it evaluated systems for their ability to identify mentions of concepts and to link together those mentions. Second, it evaluated the ability of the systems to link together ground truth mentions that refer to the same entity. Twenty teams representing 29 organizations and nine countries participated in the coreference challenge. RESULTS The teams' system submissions showed that machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggled in solving coreference resolution for cases that required domain knowledge.

[1]  Nianwen Xue,et al.  CoNLL-2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes , 2011, CoNLL Shared Task.

[2]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[3]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[4]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[5]  Alex A. T. Bui,et al.  Tracking medication information across medical records , 2009, AMIA.

[6]  Oussama El-Rawas,et al.  Machine learning-based coreference resolution of concepts in clinical documents , 2012, J. Am. Medical Informatics Assoc..

[7]  Christopher G. Chute,et al.  Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition , 2008, LREC.

[8]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[9]  Peter Szolovits,et al.  MCORES: a system for noun phrase coreference resolution for clinical records , 2012, J. Am. Medical Informatics Assoc..

[10]  Demetrios G. Glinos A Search Based Method for Clinical Text Coreference Resolution , 2011 .

[11]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[12]  Ted Briscoe,et al.  Statistical Anaphora Resolution in Biomedical Texts , 2008, COLING.

[13]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[14]  Stephen Pulman,et al.  Evaluating the State of the Art , 1995 .

[15]  Siddhartha Jonnalagadda,et al.  Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules , 2012, J. Am. Medical Informatics Assoc..

[16]  Tian Ye He,et al.  Coreference resolution on entities and events for hospital discharge summaries , 2007 .

[17]  Christopher D. Manning,et al.  Proceedings of the Fifteenth Conference on Computational Natural Language Learning, CoNLL 2011, Portland, Oregon, USA, June 23-24, 2011 , 2011, CoNLL.

[18]  Seung-won Hwang,et al.  Web scale taxonomy cleansing , 2011, Proc. VLDB Endow..

[19]  Raymond J. Mooney,et al.  Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , 2005 .

[20]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[21]  Abdul V. Roudsari,et al.  Coreference resolution in clinical discharge summaries, progress notes, surgical and pathology reports: a unified lexical approach , 2011 .

[22]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[23]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[24]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[25]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[26]  Dan Roth,et al.  Using domain knowledge and domain-inspired discourse model for coreference resolution for clinical narratives , 2013, J. Am. Medical Informatics Assoc..

[27]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[28]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[29]  C. Fellbaum An Electronic Lexical Database , 1998 .

[30]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[31]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[32]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.