Automatic negation detection in narrative pathology reports

OBJECTIVE To detect negations of medical entities in free-text pathology reports with different approaches, and evaluate their performances. METHODS AND MATERIAL Three different approaches were applied for negation detection: the lexicon-based approach was a rule-based method, relying on trigger terms and termination clues; the syntax-based approach was also a rule-based method, where the rules and negation patterns were designed using the dependency output from the Stanford parser; the machine-learning-based approach used a support vector machine as a classifier to build models with a number of features. A total of 284 English pathology reports of lymphoma were used for the study. RESULTS The machine-learning-based approach had the best overall performance on the test set with micro-averaged F-score of 82.56%, while the syntax-based approach performed worst with 78.62% F-score. The lexicon-based approach attained an overall average precision of 89.74% and recall of 76.09%, which were significantly better than the results achieved by Negation Tagger with a similar approach. DISCUSSION The lexicon-based approach benefitted from being customized to the corpus more than the other two methods. The errors in negation detection with the syntax-based approach producing poorest performance were mainly due to the poor parsing results, and the errors with the other methods were probably because of the abnormal grammatical structures. CONCLUSIONS A machine-learning-based approach has potential advantages for negation detection, and may be preferable for the task. To improve the overall performance, one of the possible solutions is to apply different approaches to each section in the reports.

[1]  Min Li,et al.  A knowledge discovery and reuse pipeline for information extraction in clinical notes , 2011, J. Am. Medical Informatics Assoc..

[2]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[3]  C. Compton,et al.  AJCC Cancer Staging Manual , 2002, Springer New York.

[4]  B. Nathwani,et al.  Recommendations for the reporting of lymphoid neoplasms: A report from the Association of Directors of Anatomic and Surgical Pathology , 2004, Modern Pathology.

[5]  Sunghwan Sohn,et al.  Dependency Parser-based Negation Detection in Clinical Narratives , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[6]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[7]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[8]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[9]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[10]  Jon Patrick,et al.  Automatic population of structured reports from narrative pathology reports , 2014 .

[11]  Timothy Baldwin,et al.  Automatic Interpretation of Noun Compounds Using WordNet Similarity , 2005, IJCNLP.

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Jules J Berman,et al.  Implementation and evaluation of a negation tagger in a pipeline-based system for information extract from pathology reports. , 2004, Studies in health technology and informatics.

[14]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[15]  Yang Huang,et al.  A novel hybrid approach to automated negation detection in clinical radiology reports. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[16]  Clement J. McDonald,et al.  Extracting Structured Information from Free Text Pathology Reports , 2003, AMIA.

[17]  L. Sobin,et al.  TNM Classification of Malignant Tumours , 1987, UICC International Union Against Cancer.

[18]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[19]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.

[20]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[21]  Prakash M. Nadkarni,et al.  Research Paper: Use of General-purpose Negation Detection to Augment Concept Indexing of Medical Documents: A Quantitative Study Using the UMLS , 2001, J. Am. Medical Informatics Assoc..

[22]  Peter J. Haug,et al.  Comparing Natural Language Processing Tools to Extract Medical Problems from Narrative Text , 2005, AMIA.

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .