Semantic Role Labeling of Clinical Text: Comparing Syntactic Parsers and Features

Semantic role labeling (SRL), which extracts shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding clinical narratives. Since semantic roles are formed by syntactic constituents in the sentence, an effective parser, as well as an effective syntactic feature set are essential to build a practical SRL system. Our study initiates a formal evaluation and comparison of SRL performance on a clinical text corpus MiPACQ, using three state-of-the-art parsers, the Stanford parser, the Berkeley parser, and the Charniak parser. First, the original parsers trained on the open domain syntactic corpus Penn Treebank were employed. Next, those parsers were retrained on the clinical Treebank of MiPACQ for further comparison. Additionally, state-of-the-art syntactic features from open domain SRL were also examined for clinical text. Experimental results showed that retraining the parsers on clinical Treebank improved the performance significantly, with an optimal F1 measure of 71.41% achieved by the Berkeley parser.