Anaphoric reference in clinical reports: Characteristics of an annotated corpus

MOTIVATION Expressions that refer to a real-world entity already mentioned in a narrative are often considered anaphoric. For example, in the sentence "The pain comes and goes," the expression "the pain" is probably referring to a previous mention of pain. Interpretation of meaning involves resolving the anaphoric reference: deciding which expression in the text is the correct antecedent of the referring expression, also called an anaphor. We annotated a set of 180 clinical reports (surgical pathology, radiology, discharge summaries, and emergency department) from two institutions to indicate all anaphor-antecedent pairs. OBJECTIVE The objective of this study is to describe the characteristics of the corpus in terms of the frequency of anaphoric relations, the syntactic and semantic nature of the members of the pairs, and the types of anaphoric relations that occur. Understanding how anaphoric reference is exhibited in clinical reports is critical to developing reference resolution algorithms and to identifying peculiarities of clinical text that may alter the features and methodologies that will be successful for automated anaphora resolution. RESULTS We found that anaphoric reference is prevalent in all types of clinical reports, that annotations of noun phrases, semantic type, and section headings may be especially important for automated resolution of anaphoric reference, and that separate modules for reference resolution may be required for different report types, different institutions, and different types of anaphors. Accurate resolution will probably require extensive domain knowledge-especially for pathology and radiology reports with more part/whole and set/subset relations. CONCLUSION We hope researchers will leverage the annotations in this corpus to develop automated algorithms and will add to the annotations to generate a more extensive corpus.

[1]  Pascal Denis,et al.  Specialized Models and Ranking for Coreference Resolution , 2008, EMNLP.

[2]  Christopher G. Chute,et al.  Constructing Evaluation Corpora for Automated Clinical Named Entity Recognition , 2008, LREC.

[3]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[4]  Malvina Nissim,et al.  Comparing Knowledge Sources for Nominal Anaphora Resolution , 2005, Computational Linguistics.

[5]  Massimo Poesio,et al.  The MATE/GNOME Proposals for Anaphoric Annotation, Revisited , 2004, SIGDIAL Workshop.

[6]  James F. Allen,et al.  Empirical evaluations of pronoun resolution , 2005 .

[7]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[8]  Vladimir Estivill-Castro Computer Science 2005 , 2005 .

[9]  Michael Hegarty,et al.  Cognitive Status, Information Structure, and Pronominal Reference to Clausally Introduced Entities , 2003, J. Log. Lang. Inf..

[10]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[11]  Bonnie Lynn Webber,et al.  Description Formation and Discourse Model Synthesis , 1978, TINLAP.

[12]  Heeyoung Lee,et al.  Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task , 2011, CoNLL Shared Task.

[13]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[14]  Vincent Ng,et al.  Supervised Models for Coreference Resolution , 2009, EMNLP.

[15]  Olivier Bodenreider,et al.  Exploring semantic groups through visual approaches , 2003, J. Biomed. Informatics.

[16]  Jeanette K. Gundel,et al.  Cognitive Status and the form of Referring Expressions in Discourse , 1993, The Oxford Handbook of Reference.

[17]  Fredrik Olsson A survey of machine learning for reference resolution in textual discourse , 2004 .

[18]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[19]  Jerry R. Hobbs Resolving pronoun references , 1986 .

[20]  C G Chute,et al.  Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents , 2010, Methods of Information in Medicine.

[21]  V SXW6Y,et al.  Resolving Bridging References in Unrestricted Text , 1997 .

[22]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[23]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[24]  Robert Dale,et al.  Using the WordNet Hierarchy for Associative Anaphora Resolution , 2002, COLING 2002.

[25]  Raymond Reiter,et al.  Anaphora and Logical Form: On Formal Meaning Representations for Natural Language , 1977, IJCAI.

[26]  Martin Romacker,et al.  MedSynDikate - a natural language system for the extraction of medical information from findings reports , 2002, Int. J. Medical Informatics.

[27]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[28]  Joyce Yue Chai,et al.  Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates , 2010, ACL.

[29]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[30]  Sanda M. Harabagiu,et al.  RESOLUTION , 1977, Monatsschrift für Kriminologie und Strafrechtsreform.

[31]  Chen Lin,et al.  A system for coreference resolution for the clinical narrative , 2012, J. Am. Medical Informatics Assoc..

[32]  Wendy W. Chapman,et al.  Coreference resolution: A review of general methodologies and applications in the clinical domain , 2011, J. Biomed. Informatics.

[33]  Yuji Matsumoto,et al.  Coreference based event-argument relation extraction on biomedical text , 2011, Semantic Mining in Biomedicine.

[34]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[35]  Martha Palmer,et al.  An architecture for complex clinical question answering , 2010, IHI.

[36]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[37]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[38]  Lynette Hirschman,et al.  Appendix F: MUC-7 Coreference Task Definition (version 3.0) , 1998, MUC.

[39]  Amit Bagga Evaluation of Coreferences and Coreference Resolution Systems , 1998 .

[40]  Claire Gardent,et al.  Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[41]  Kirk Roberts,et al.  C-3: Coherence and Coreference Corpus , 2010, LREC.

[42]  Joel R. Tetreaul,et al.  A Corpus-Based Evaluation of Centering and Pronoun Resolution , 2001, CL.

[43]  Joel Tetreault,et al.  A Corpus-Based Evaluation of Centering and Pronoun Resolution , 2001, Computational Linguistics.

[44]  Vincent Ng,et al.  Unsupervised Models for Coreference Resolution , 2008, EMNLP.

[45]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[46]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[47]  Mira Ariel Accessibility theory: An overview , 2001 .

[48]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.