A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors

All types of part-of-speech (POS) tagging errors have been equally treated by existing taggers. However, the errors are not equally important, since some errors affect the performance of subsequent natural language processing seriously while others do not. This paper aims to minimize these serious errors while retaining the overall performance of POS tagging. Two gradient loss functions are proposed to reflect the different types of errors. They are designed to assign a larger cost for serious errors and a smaller cost for minor errors. Through a series of experiments, it is shown that the classifier trained with the proposed loss functions not only reduces serious errors but also achieves slightly higher accuracy than ordinary classifiers.

[1]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[2]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[3]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[4]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[5]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[6]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[7]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[8]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[9]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[10]  Kwang-Eun Ko,et al.  Occluded Object Motion Estimation System based on Particle Filter with 3D Reconstruction , 2012, Int. J. Fuzzy Log. Intell. Syst..

[11]  Xu Sun,et al.  Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference , 2008, COLING.

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[16]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[17]  Zhi-Hua Zhou,et al.  ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[18]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[19]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[20]  Mitch Marcus,et al.  A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon , 2009, EMNLP.

[21]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[22]  Ben Taskar,et al.  Posterior vs Parameter Sparsity in Latent Variable Models , 2009, NIPS.

[23]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[24]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[25]  Yong-Nyuo Shin Implementation Privacy Reference Architecture for Forensic Readiness , 2012, Int. J. Fuzzy Log. Intell. Syst..

[26]  Thomas Hofmann,et al.  Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences , 2003, EMNLP.

[27]  Jan Hajic,et al.  Semi-Supervised Training for the Averaged Perceptron POS Tagger , 2009, EACL.

[28]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[29]  Yuji Matsumoto,et al.  Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines , 2001, NLPRS.

[30]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[31]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[32]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[33]  Sung-Hae Jun,et al.  Support Vector Machine based on Stratified Sampling , 2009, Int. J. Fuzzy Log. Intell. Syst..