Weakly Supervised POS Tagging without Disambiguation

Weakly supervised part-of-speech (POS) tagging is to learn to predict the POS tag for a given word in context by making use of partial annotated data instead of the fully tagged corpora. Weakly supervised POS tagging would benefit various natural language processing applications in such languages where tagged corpora are mostly unavailable. In this article, we propose a novel framework for weakly supervised POS tagging based on a dictionary of words with their possible POS tags. In the constrained error-correcting output codes (ECOC)-based approach, a unique L-bit vector is assigned to each POS tag. The set of bitvectors is referred to as a coding matrix with value { 1, -1}. Each column of the coding matrix specifies a dichotomy over the tag space to learn a binary classifier. For each binary classifier, its training data is generated in the following way: each pair of words and its possible POS tags are considered as a positive training example only if the whole set of its possible tags falls into the positive dichotomy specified by the column coding and similarly for negative training examples. Given a word in context, its POS tag is predicted by concatenating the predictive outputs of the L binary classifiers and choosing the tag with the closest distance according to some measure. By incorporating the ECOC strategy, the set of all possible tags for each word is treated as an entirety without the need of performing disambiguation. Moreover, instead of manual feature engineering employed in most previous POS tagging approaches, features for training and testing in the proposed framework are automatically generated using neural language modeling. The proposed framework has been evaluated on three corpora for English, Italian, and Malagasy POS tagging, achieving accuracies of 93.21%, 90.9%, and 84.5% individually, which shows a significant improvement compared to the state-of-the-art approaches.

[1]  Ashish Vaswani,et al.  Fast, Greedy Model Minimization for Unsupervised Tagging , 2010, COLING.

[2]  Mark Steedman,et al.  Two Decades of Unsupervised POS Induction: How Far Have We Come? , 2010, EMNLP.

[3]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[6]  Jason Baldridge,et al.  Learning a Part-of-Speech Tagger from Two Hours of Annotation , 2013, NAACL.

[7]  Min-Ling Zhang,et al.  Disambiguation-Free Partial Label Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[9]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[10]  Yue Zhang,et al.  Type-Supervised Domain Adaptation for Joint Segmentation and POS-Tagging , 2014, EACL.

[11]  Mitch Marcus,et al.  A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon , 2009, EMNLP.

[12]  Ari Rappoport,et al.  Improved Unsupervised POS Induction through Prototype Discovery , 2010, ACL.

[13]  Mehmet Ali Yatbaz,et al.  Unsupervised Part of Speech Tagging Using Unambiguous Substitutes from a Statistical Language Model , 2010, COLING.

[14]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[15]  Deyu Zhou,et al.  Event trigger identification for biomedical events extraction using domain knowledge , 2014, Bioinform..

[16]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[17]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[18]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[19]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[20]  Sergei Vassilvitskii,et al.  Parallel Algorithms for Unsupervised Tagging , 2014, Transactions of the Association for Computational Linguistics.

[21]  Regina Barzilay,et al.  Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches , 2009, J. Artif. Intell. Res..

[22]  Christian Biemann,et al.  Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering , 2006, ACL.

[23]  Micha Elsner,et al.  POS induction with distributional and morphological information using a distance-dependent Chinese restaurant process , 2014, ACL.

[24]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[25]  Liangyu Chen,et al.  An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization , 2015, AAAI.

[26]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[27]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[28]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[29]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[30]  Jason Baldridge,et al.  Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries , 2012, EMNLP.

[31]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[32]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[33]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[34]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL/IJCNLP.

[35]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[36]  Miles Osborne,et al.  Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10) , 2010 .

[37]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.