Biomedical negation scope detection with conditional random fields

OBJECTIVE Negation is a linguistic phenomenon that marks the absence of an entity or event. Negated events are frequently reported in both biological literature and clinical notes. Text mining applications benefit from the detection of negation and its scope. However, due to the complexity of language, identifying the scope of negation in a sentence is not a trivial task. DESIGN Conditional random fields (CRF), a supervised machine-learning algorithm, were used to train models to detect negation cue phrases and their scope in both biological literature and clinical notes. The models were trained on the publicly available BioScope corpus. MEASUREMENT The performance of the CRF models was evaluated on identifying the negation cue phrases and their scope by calculating recall, precision and F1-score. The models were compared with four competitive baseline systems. RESULTS The best CRF-based model performed statistically better than all baseline systems and NegEx, achieving an F1-score of 98% and 95% on detecting negation cue phrases and their scope in clinical notes, and an F1-score of 97% and 85% on detecting negation cue phrases and their scope in biological literature. CONCLUSIONS This approach is robust, as it can identify negation scope in both biological and clinical text. To benefit text mining applications, the system is publicly available as a Java API and as an online application at http://negscope.askhermes.org.

[1]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[2]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[3]  Massimo Poesio,et al.  Negation of protein-protein interactions: analysis and extraction , 2007, ISMB/ECCB.

[4]  Graciela Gonzalez,et al.  BANNER: An Executable Survey of Advances in Biomedical Named Entity Recognition , 2007, Pacific Symposium on Biocomputing.

[5]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[6]  George Hripcsak,et al.  Development, implementation, and a cognitive evaluation of a definitional question answering system for physicians , 2007, J. Biomed. Informatics.

[7]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[8]  Hong Yu,et al.  Extracting synonymous gene and protein terms from biological literature , 2003, ISMB.

[9]  János Csirik,et al.  The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts , 2008, BioNLP.

[10]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[11]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[12]  Peter L. Elkin,et al.  A controlled trial of automated classification of negation from clinical notes , 2005, BMC Medical Informatics Decis. Mak..

[13]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[14]  Roser Morante,et al.  A Metalearning Approach to Processing the Scope of Negation , 2009, CoNLL.

[15]  Lior Rokach,et al.  Context-Sensitive Medical Information Retrieval , 2004, MedInfo.

[16]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[17]  Ido Dagan,et al.  Contextual Word Similarity and Estimation from Sparse Data , 1993, ACL.

[18]  Hong Yu,et al.  Automatically Extracting Information Needs from Ad Hoc Clinical Questions , 2008, AMIA.

[19]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..