MCORES: a system for noun phrase coreference resolution for clinical records

OBJECTIVE Narratives of electronic medical records contain information that can be useful for clinical practice and multi-purpose research. This information needs to be put into a structured form before it can be used by automated systems. Coreference resolution is a step in the transformation of narratives into a structured form. METHODS This study presents a medical coreference resolution system (MCORES) for noun phrases in four frequently used clinical semantic categories: persons, problems, treatments, and tests. MCORES treats coreference resolution as a binary classification task. Given a pair of concepts from a semantic category, it determines coreferent pairs and clusters them into chains. MCORES uses an enhanced set of lexical, syntactic, and semantic features. Some MCORES features measure the distance between various representations of the concepts in a pair and can be asymmetric. RESULTS AND CONCLUSION MCORES was compared with an in-house baseline that uses only single-perspective 'token overlap' and 'number agreement' features. MCORES was shown to outperform the baseline; its enhanced features contribute significantly to performance. In addition to the baseline, MCORES was compared against two available third-party, open-domain systems, RECONCILE(ACL09) and the Beautiful Anaphora Resolution Toolkit (BART). MCORES was shown to outperform both of these systems on clinical records.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[3]  M. T. Lino,et al.  Proceedings of the 4th International Conference on Language Resources and Evaluation , 2004 .

[4]  Yannick Versley,et al.  SemEval-2010 Task 1: Coreference Resolution in Multiple Languages , 2009, *SEMEVAL.

[5]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[6]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[7]  Yannick Versley,et al.  BART: A Modular Toolkit for Coreference Resolution , 2008, ACL.

[8]  Ricky K. Taira,et al.  Inter-document Coreference Resolution of Abnormal Findings in Radiology Documents , 2004, MedInfo.

[9]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[10]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[11]  Jian Su,et al.  An NP-Cluster Based Approach to Coreference Resolution , 2004, COLING.

[12]  Barry Smith,et al.  Proceedings of the AMIA Symposium , 2005 .

[13]  Andrew R. Post,et al.  An Enhanced Framework for Pattern Detection in Clinical Laboratory Data , 2002, AMIA.

[14]  Rupert G. Miller Simultaneous Statistical Inference , 1966 .

[15]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.

[16]  Alex A. T. Bui,et al.  Tracking medication information across medical records , 2009, AMIA.

[17]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[18]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[19]  M. Saeed Multiparameter Intelligent Monitoring in Intensive Care II ( MIMIC-II ) : A public-access intensive care unit database , 2011 .

[20]  Claire Cardie,et al.  Coreference Resolution with Reconcile , 2010, ACL.

[21]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[22]  James W. Cooper,et al.  Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model , 2009, J. Biomed. Informatics.

[23]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[24]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[25]  Eduard H. Hovy,et al.  BLANC: Implementing the Rand index for coreference evaluation , 2010, Natural Language Engineering.

[26]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[27]  Raymond J. Mooney,et al.  Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , 2005 .

[28]  Dan Klein,et al.  Simple Coreference Resolution with Rich Syntactic and Semantic Features , 2009, EMNLP.

[29]  Claire Cardie,et al.  Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art , 2009, ACL.

[30]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[31]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[32]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[33]  Wendy G. Lehnert,et al.  Using Decision Trees for Coreference Resolution , 1995, IJCAI.

[34]  M. R E C A S E,et al.  BLANC: Implementing the Rand index for coreference evaluation , 2010, Natural Language Engineering.

[35]  Xiaoqiang Luo,et al.  On Coreference Resolution Performance Metrics , 2005, HLT.

[36]  Ted Briscoe,et al.  Statistical Anaphora Resolution in Biomedical Texts , 2008, COLING.

[37]  Hui Yang,et al.  Automatic extraction of medication information from medical discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[38]  Tian Ye He,et al.  Coreference resolution on entities and events for hospital discharge summaries , 2007 .