Two Decades of Unsupervised POS Induction: How Far Have We Come?

Part-of-speech (POS) induction is one of the most popular tasks in research on unsupervised NLP. Many different methods have been proposed, yet comparisons are difficult to make since there is little consensus on evaluation framework, and many papers evaluate against only one or two competitor systems. Here we evaluate seven different POS induction systems spanning nearly 20 years of work, using a variety of measures. We show that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches. Since most of these systems were developed and tested using data from the WSJ corpus, we compare their generalization abilities by testing on both WSJ and the multilingual Multext-East corpus. Finally, we introduce the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner. In most cases, the prototype-driven learner outperforms the unsupervised system used to initialize it, yielding state-of-the-art results on WSJ and improvements on non-English corpora.

[1]  Frank Keller,et al.  Evaluating Models of Syntactic Category Acquisition without Using a Gold Standard , 2009 .

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Zoubin Ghahramani,et al.  The infinite HMM for unsupervised PoS tagging , 2009, EMNLP.

[4]  Regina Barzilay,et al.  Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches , 2009, J. Artif. Intell. Res..

[5]  John DeNero,et al.  Painless Unsupervised Learning with Features , 2010, NAACL.

[6]  Ben Taskar,et al.  Posterior vs Parameter Sparsity in Latent Variable Models , 2009, NIPS.

[7]  Mark Johnson,et al.  A Bayesian LDA-based model for semi-supervised part-of-speech tagging , 2007, NIPS.

[8]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[9]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[10]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[11]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[12]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[13]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[14]  Christian Biemann,et al.  Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering , 2006, ACL.

[15]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[16]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[17]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[18]  Tomaz Erjavec,et al.  MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2004, LREC.

[19]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[20]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[21]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[22]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.