Parsing clinical text: how good are the state-of-the-art parsers?

BackgroundParsing, which generates a syntactic structure of a sentence (a parse tree), is a critical component of natural language processing (NLP) research in any domain including medicine. Although parsers developed in the general English domain, such as the Stanford parser, have been applied to clinical text, there are no formal evaluations and comparisons of their performance in the medical domain.MethodsIn this study, we investigated the performance of three state-of-the-art parsers: the Stanford parser, the Bikel parser, and the Charniak parser, using following two datasets: (1) A Treebank containing 1,100 sentences that were randomly selected from progress notes used in the 2010 i2b2 NLP challenge and manually annotated according to a Penn Treebank based guideline; and (2) the MiPACQ Treebank, which is developed based on pathology notes and clinical notes, containing 13,091 sentences. We conducted three experiments on both datasets. First, we measured the performance of the three state-of-the-art parsers on the clinical Treebanks with their default settings. Then we re-trained the parsers using the clinical Treebanks and evaluated their performance using the 10-fold cross validation method. Finally we re-trained the parsers by combining the clinical Treebanks with the Penn Treebank.ResultsOur results showed that the original parsers achieved lower performance in clinical text (Bracketing F-measure in the range of 66.6%-70.3%) compared to general English text. After retraining on the clinical Treebank, all parsers achieved better performance, with the best performance from the Stanford parser that reached the highest Bracketing F-measure of 73.68% on progress notes and 83.72% on the MiPACQ corpus using 10-fold cross validation. When the combined clinical Treebanks and Penn Treebank was used, of the three parsers, the Charniak parser achieved the highest Bracketing F-measure of 73.53% on progress notes and the Stanford parser reached the highest F-measure of 84.15% on the MiPACQ corpus.ConclusionsOur study demonstrates that re-training using clinical Treebanks is critical for improving general English parsers' performance on clinical text, and combining clinical and open domain corpora might achieve optimal performance for parsing clinical text.

[1]  M Chetana,et al.  Effect of Dianex, a herbal formulation on experimentally induced diabetes mellitus , 2005, Phytotherapy research : PTR.

[2]  J. Austin,et al.  Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. , 2002, Radiology.

[3]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[4]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[5]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[6]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[7]  Anderson Spickard,et al.  Research Paper: "Understanding" Medical School Curriculum Content Using KnowledgeMap , 2003, J. Am. Medical Informatics Assoc..

[8]  Nicoletta Calzolari,et al.  Review of Medical language processing: computer management of narrative data by Naomi Sager, Carol Friedman, and Margaret S. Lyman. Addison-Wesley 1987. , 1989 .

[9]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[10]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[11]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[12]  Eugene Charniak,et al.  Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing , 2010 .

[13]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[14]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[15]  P J Haug,et al.  Experience with a mixed semantic/syntactic parser. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[16]  Hua Xu,et al.  Syntactic parsing of clinical text: guideline and corpus development with handling ill-formed sentences. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[17]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[18]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[19]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[20]  Mitchell P. Marcus,et al.  On the parameter space of generative lexicalized statistical parsing models , 2004 .

[21]  Rodney D. Nielsen,et al.  Towards comprehensive syntactic and semantic annotations of the clinical narrative , 2013, J. Am. Medical Informatics Assoc..

[22]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[23]  Peter J. Haug,et al.  A natural language parsing system for encoding admitting diagnoses , 1997, AMIA.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[26]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[27]  Dan Klein,et al.  Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon , 2005 .