论文信息 - Tagging Medical Documents with High Accuracy

Tagging Medical Documents with High Accuracy

We ran both Brill's rule-based tagger and TNT, a statistical tagger, with a default German newspaper-language model on a medical text corpus. Supplied with limited lexicon resources, TNT outperforms the Brill tagger with state-of-the-art performance figures (close to 97% accuracy). We then trained TnT on a large annotated medical text corpus, with a slightly extended tagset that captures certain medical language particularities, and achieved 98% tagging accuracy. Hence, statistical off-the-shelf POS taggers cannot only be immediately reused for medical NLP, but they also achieve - when trained on medical corpora - a higher performance level than for the newspaper genre.

Udo Hahn | Joachim Wermter | U. Hahn | J. Wermter

[1] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[2] David A. Campbell,et al. Comparing syntactic complexity in medical and non-medical corpora , 2001, AMIA.

[3] Udo Hahn,et al. An Annotated German-Language Medical Text Corpus as Language Resource , 2004, LREC.

[4] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[5] Eric Brill,et al. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[6] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[7] Lluís Màrquez i Villodre,et al. Fast and accurate part-of-speech tagging: The SVM approach revisited , 2003, RANLP.

[8] G Hripcsak,et al. Natural language processing and its future in medicine. , 1999, Academic medicine : journal of the Association of American Medical Colleges.

[9] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10] Wojciech Skut,et al. An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[11] Atro Voutilainen,et al. Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.