Analyzing the Errors of Unsupervised Learning

We identify four types of errors that unsupervised induction systems make and study each one in turn. Our contributions include (1) using a meta-model to analyze the incorrect biases of a model in a systematic way, (2) providing an efficient and robust method of measuring distance between two parameter settings of a model, and (3) showing that local optima issues which typically plague EM can be somewhat alleviated by increasing the number of training examples. We conduct our analyses on three models: the HMM, the PCFG, and a simple dependency model.

[1]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[2]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[3]  Bernard Mérialdo,et al.  Tagging English Text with a Probabilistic Model , 1994, CL.

[4]  Dana Ron,et al.  On the learnability and usage of acyclic probabilistic finite automata , 1995, COLT '95.

[5]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[6]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[7]  Alexander Clark Unsupervised induction of stochastic context-free grammars using distributional clustering , 2001, CoNLL.

[8]  Alexander Clark,et al.  PAC-learnability of Probabilistic Deterministic Finite State Automata , 2004, J. Mach. Learn. Res..

[9]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[10]  Dan Klein,et al.  Unsupervised Learning of Field Segmentation Models for Information Extraction , 2005, ACL.

[11]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[12]  J. Feldman,et al.  Learning mixtures of product distributions over discrete domains , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  Noah A. Smith,et al.  Annealing Structural Bias in Multilingual Weighted Grammar Induction , 2006, ACL.

[14]  Gregory Shakhnarovich,et al.  An investigation of computational and informational limits in Gaussian mixture clustering , 2006, ICML '06.

[15]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[16]  Thomas L. Griffiths,et al.  A fully Bayesian approach to unsupervised part-of-speech tagging , 2007, ACL.

[17]  Sanjoy Dasgupta,et al.  A Probabilistic Analysis of EM for Mixtures of Separated, Spherical Gaussians , 2007, J. Mach. Learn. Res..

[18]  Mark Johnson,et al.  Why Doesn’t EM Find Good HMM POS-Taggers? , 2007, EMNLP.

[19]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.