Improved Unsupervised POS Induction Using Intrinsic Clustering Quality and a Zipfian Constraint

Modern unsupervised POS taggers usually apply an optimization procedure to a non-convex function, and tend to converge to local maxima that are sensitive to starting conditions. The quality of the tagging induced by such algorithms is thus highly variable, and researchers report average results over several random initializations. Consequently, applications are not guaranteed to use an induced tagging of the quality reported for the algorithm. In this paper we address this issue using an unsupervised test for intrinsic clustering quality. We run a base tagger with different random initializations, and select the best tagging using the quality test. As a base tagger, we modify a leading unsupervised POS tagger (Clark, 2003) to constrain the distributions of word types across clusters to be Zipfian, allowing us to utilize a perplexity-based quality test. We show that the correlation between our quality test and gold standard-based tagging quality measures is high. Our results are better in most evaluation measures than all results reported in the literature for this task, and are always better than the Clark average results.

[1]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[2]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[3]  Ben Taskar,et al.  Posterior vs Parameter Sparsity in Latent Variable Models , 2009, NIPS.

[4]  Regina Barzilay,et al.  Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach , 2009, NAACL.

[5]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[8]  Christian Biemann,et al.  Unsupervised Part-of-Speech Tagging Employing Efficient Graph Clustering , 2006, ACL.

[9]  Dan Klein,et al.  Prototype-Driven Learning for Sequence Models , 2006, NAACL.

[10]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[11]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[12]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[13]  Hermann Ney,et al.  Algorithms for bigram and trigram word clustering , 1995, Speech Commun..

[14]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[15]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[16]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[17]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[18]  Atro Voutilainen Part-of-Speech Tagging , 2005 .

[19]  Regina Barzilay,et al.  Unsupervised Multilingual Learning for POS Tagging , 2008, EMNLP.

[20]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[21]  Dale Schuurmans,et al.  The latent maximum entropy principle , 2002, Proceedings IEEE International Symposium on Information Theory,.

[22]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[23]  Kevin Knight,et al.  Minimized Models for Unsupervised Part-of-Speech Tagging , 2009, ACL.

[24]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[25]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[26]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[27]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[28]  Vincent Ng,et al.  Unsupervised Part-of-Speech Acquisition for Resource-Scarce Languages , 2007, EMNLP-CoNLL.

[29]  R. Forthofer,et al.  Rank Correlation Methods , 1981 .

[30]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[31]  Ari Rappoport,et al.  The NVI Clustering Evaluation Measure , 2009, CoNLL.

[32]  M. Kendall,et al.  Rank Correlation Methods , 1949 .

[33]  Zoubin Ghahramani,et al.  The infinite HMM for unsupervised PoS tagging , 2009, EMNLP.

[34]  Q.I. Wang,et al.  Improved estimation for unsupervised part-of-speech tagging , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[35]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[36]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[37]  Dayne Freitag,et al.  Toward Unsupervised Whole-Corpus Tagging , 2004, COLING.

[38]  Mitch Marcus,et al.  A Simple Unsupervised Learner for POS Disambiguation Rules Given Only a Minimal Lexicon , 2009, EMNLP.

[39]  Noah A. Smith,et al.  Annealing Techniques For Unsupervised Statistical Language Learning , 2004, ACL.

[40]  Joseph P. Levy,et al.  Connectionist models of memory and language , 1995 .