Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing

A review of published work in clinical natural language processing (NLP) may suggest that the negation detection task has been “solved.” This work proposes that an optimizable solution does not equal a generalizable solution. We introduce a new machine learning-based Polarity Module for detecting negation in clinical text, and extensively compare its performance across domains. Using four manually annotated corpora of clinical text, we show that negation detection performance suffers when there is no in-domain development (for manual methods) or training data (for machine learning-based methods). Various factors (e.g., annotation guidelines, named entity characteristics, the amount of data, and lexical and syntactic context) play a role in making generalizability difficult, but none completely explains the phenomenon. Furthermore, generalizability remains challenging because it is unclear whether to use a single source for accurate data, combine all sources into a single model, or apply domain adaptation methods. The most reliable means to improve negation detection is to manually annotate in-domain training data (or, perhaps, manually modify rules); this is a strategy for optimizing performance, rather than generalizing it. These results suggest a direction for future work in domain-adaptive and task-adaptive methods for clinical NLP.

[1]  Cui Tao,et al.  A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data , 2013, J. Am. Medical Informatics Assoc..

[2]  Rodney D. Nielsen,et al.  Towards comprehensive syntactic and semantic annotations of the clinical narrative , 2013, J. Am. Medical Informatics Assoc..

[3]  Anthony N. Nguyen,et al.  Exploiting medical hierarchies for concept-based information retrieval , 2012, ADCS.

[4]  Roser Morante,et al.  Modality and Negation: An Introduction to the Special Issue , 2012, CL.

[5]  Sunghwan Sohn,et al.  Dependency Parser-based Negation Detection in Clinical Narratives , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[6]  Matthew Scotch,et al.  The Yale cTAKES extensions for document classification: architecture and application , 2011, J. Am. Medical Informatics Assoc..

[7]  David Tresner-Kirsch,et al.  MITRE system for clinical assertion status classification , 2011, J. Am. Medical Informatics Assoc..

[8]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[9]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[10]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[11]  Alessandro Moschitti,et al.  On Reverse Feature Engineering of Syntactic Tree Kernels , 2010, CoNLL.

[12]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[13]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[14]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[15]  Brett R South,et al.  Adaptation of the NegEx algorithm to Veterans Affairs electronic text notes for detection of influenza-like illness (ILI). , 2007, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[17]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[18]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[19]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[20]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[21]  Adwait Ratnaparkhi,et al.  A Linear Observed Time Statistical Parser Based on Maximum Entropy Models , 1997, EMNLP.

[22]  Mike Conway,et al.  Extending the NegEx Lexicon for Multiple Languages , 2013, MedInfo.

[23]  Rodney D. Nielsen,et al.  The MiPACQ clinical question answering system. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[24]  Ilya M. Goldin,et al.  Learning to Detect Negation with ‘Not’ in Medical Texts , 2003 .