The infinite HMM for unsupervised PoS tagging

We extend previous work on fully unsupervised part-of-speech tagging. Using a non-parametric version of the HMM, called the infinite HMM (iHMM), we address the problem of choosing the number of hidden states in unsupervised Markov models for PoS tagging. We experiment with two non-parametric priors, the Dirichlet and Pitman-Yor processes, on the Wall Street Journal dataset using a parallelized implementation of an iHMM inference algorithm. We evaluate the results with a variety of clustering evaluation metrics and achieve equivalent or better performances than previously reported. Building on this promising result we evaluate the output of the unsupervised PoS tagger as a direct replacement for the output of a fully supervised PoS tagger for the task of shallow parsing and compare the two evaluations.

[1]  Christopher D. Manning,et al.  The Infinite Tree , 2007, ACL.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  S. L. Scott Bayesian Methods for Hidden Markov Models , 2002 .

[4]  Thomas L. Griffiths,et al.  Interpolating between types and tokens by estimating power-law generators , 2005, NIPS.

[5]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[6]  Claudio Giuliano,et al.  Unsupervised Part of Speech Tagging Supporting Supervised Methods , 2007 .

[7]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[8]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[9]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[10]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[11]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[12]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[13]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[14]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[15]  Jianfeng Gao,et al.  A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers , 2008, EMNLP.

[16]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[17]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[18]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[19]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.