Implementation and Evaluation of a Negation Tagger in a Pipeline-based System for Information Extraction from Pathology Reports

We have developed a pipeline-based system for automated annotation of Surgical Pathology Reports with UMLS terms that builds on GATE – an open-source architecture for language engineering. The system includes a module for detecting and annotating negated concepts, which implements the NegEx algorithm – an algorithm originally described for use in discharge summaries and radiology reports. We describe the implementation of the system, and early evaluation of the Negation Tagger. Our results are encouraging. In the key Final Diagnosis section, with almost no modification of the algorithm or phrase lists, the system performs with precision of 0.84 and recall of 0.80 against a gold-standard corpus of negation annotations, created by modified Delphi technique by a panel of pathologists. Further work will focus on refining the Negation Tagger and UMLS Tagger and adding additional processing resources for annotating freetext pathology reports.