论文信息 - Fast High-Accuracy Part-of-Speech Tagging by Independent Classifiers

Fast High-Accuracy Part-of-Speech Tagging by Independent Classifiers

Part-of-speech (POS) taggers can be quite accurate, but for practical use, accuracy often has to be sacrificed for speed. For example, the maintainers of the Stanford tagger (Toutanova et al., 2003; Manning, 2011) recommend tagging with a model whose per tag error rate is 17% higher, relatively, than their most accurate model, to gain a factor of 10 or more in speed. In this paper, we treat POS tagging as a single-token independent multiclass classification task. We show that by using a rich feature set we can obtain high tagging accuracy within this framework, and by employing some novel feature-weight-combination and hypothesis-pruning techniques we can also get very fast tagging with this model. A prototype tagger implemented in Perl is tested and found to be at least 8 times faster than any publicly available tagger reported to have comparable accuracy on the standard Penn Treebank Wall Street Journal test set.

Robert Moore

[1] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[2] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[3] Dan Klein,et al. Structure compilation: trading structure for features , 2008, ICML '08.

[4] Christopher D. Manning. Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[5] Lluís Màrquez i Villodre,et al. SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[6] Jan Hajic,et al. Semi-Supervised Training for the Averaged Perceptron POS Tagger , 2009, EACL.

[7] Adwait Ratnaparkhi,et al. A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[8] Anders Søgaard,et al. Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[9] Giorgio Satta,et al. Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[10] Inderjit S. Dhillon,et al. A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..

[11] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.