Improving Part-of-Speech Tagging for NLP Pipelines

This paper outlines the results of sentence level linguistics based rules for improving part-of-speech tagging. It is well known that the performance of complex NLP systems is negatively affected if one of the preliminary stages is less than perfect. Errors in the initial stages in the pipeline have a snowballing effect on the pipeline's end performance. We have created a set of linguistics based rules at the sentence level which adjust part-of-speech tags from state-of-the-art taggers. Comparison with state-of-the-art taggers on widely used benchmarks demonstrate significant improvements in tagging accuracy and consequently in the quality and accuracy of NLP systems.

[1]  Karl Stratos,et al.  Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models , 2016, TACL.

[2]  Ewan Klein,et al.  Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2014 .

[3]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[4]  Mark Steedman,et al.  Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , 2013 .

[5]  Hae-Chang Rim,et al.  Self-Organizing Markov Models and Their Application to Part-of-Speech Tagging , 2003, ACL.

[6]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[7]  Jingbo Zhu,et al.  Easy-First POS Tagging and Dependency Parsing with Beam Search , 2013, ACL.

[8]  Yuji Matsumoto,et al.  Guessing Parts-of-Speech of Unknown Words Using Global Information , 2006, ACL.

[9]  Sudeshna Sarkar,et al.  Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario , 2007, ACL.

[10]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[11]  Mark Hepple,et al.  Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based POS Taggers , 2000, ACL.

[12]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[13]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[14]  Jan Hajic,et al.  Serial Combination of Rules and Statistics: A Case Study in Czech Tagging , 2001, ACL.

[15]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[16]  Jugal K. Kalita,et al.  Part of Speech Tagger for Assamese Text , 2009, ACL.

[17]  Min Zhang,et al.  Coupled Sequence Labeling on Heterogeneous Annotations: POS Tagging as a Case Study , 2015, ACL.

[18]  Hinrich Schütze,et al.  FLORS: Fast and Simple Domain Adaptation for Part-of-Speech Tagging , 2014, TACL.

[19]  Yuji Matsumoto,et al.  Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines , 2001, NLPRS.

[20]  Sergei Vassilvitskii,et al.  Parallel Algorithms for Unsupervised Tagging , 2014, Transactions of the Association for Computational Linguistics.

[21]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[22]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[23]  Mark Dredze,et al.  Icelandic Data Driven Part of Speech Tagging , 2008, ACL.

[24]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[25]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[26]  Pedro M. Domingos,et al.  Compositional Kernel Machines , 2017, ICLR.

[27]  Yue Zhang,et al.  Tagging The Web: Building A Robust Web Tagger with Neural Network , 2014, ACL.

[28]  Venkat Srinivasan Intelligent Enterprise in the Era of Big Data, The , 2016 .

[29]  Weiwei Sun,et al.  Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging , 2012, ACL.

[30]  Anders Søgaard Part-of-speech tagging with antagonistic adversaries , 2013, ACL.

[31]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[32]  Frédéric Béchet,et al.  Tagging Unknown Proper Names Using Decision Trees , 2000, ACL.

[33]  Jason Baldridge,et al.  Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages , 2013, ACL.

[34]  Hae-Chang Rim,et al.  Part-of-Speech Tagging Considering Surface Form for an Agglutinative Language , 2004, ACL.

[35]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[36]  John DeNero,et al.  Observational Initialization of Type-Supervised Taggers , 2014, ACL.

[37]  Kenneth Ward Church A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text , 1988, ANLP.

[38]  Martha Palmer,et al.  Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection , 2012, ACL.

[39]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[40]  Mikko Kurimo,et al.  Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy , 2014, ACL.

[41]  Hae-Chang Rim,et al.  Part-of-Speech Tagging Based on Hidden Markov Model Assuming Joint Independence , 2000, ACL.

[42]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[43]  Daniel Jurafsky,et al.  Morphological features help POS tagging of unknown words across language varieties , 2005, IJCNLP.

[44]  Pavel Pecina,et al.  Simpler unsupervised POS tagging with bilingual projections , 2013, ACL.