Identifiability and Unmixing of Latent Parse Trees

This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods [1] cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models.

[1]  T. Rothenberg Identification in Parametric Models , 1971 .

[2]  L. A. Goodman Exploratory latent structure analysis using both identifiable and unidentifiable models , 1974 .

[3]  Sartaj Sahni,et al.  Computationally Related Problems , 1974, SIAM J. Comput..

[4]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[5]  J. V. Santen,et al.  How many parameters can a model have and still be testable , 1985 .

[6]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[7]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[8]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[9]  Joseph T. Chang,et al.  Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. , 1996, Mathematical biosciences.

[10]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[11]  Jason Eisner,et al.  Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[12]  D. Geiger,et al.  Stratified exponential families: Graphical models and model selection , 2001 .

[13]  Mark A. Paskin,et al.  Grammatical Bigrams , 2001, NIPS.

[14]  Dan Klein,et al.  Conditional Structure versus Conditional Estimation in NLP Models , 2002, EMNLP.

[15]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[16]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, Symposium on the Theory of Computing.

[17]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[18]  Dan Klein,et al.  Analyzing the Errors of Unsupervised Learning , 2008, ACL.

[19]  Sham M. Kakade,et al.  A spectral algorithm for learning Hidden Markov Models , 2008, J. Comput. Syst. Sci..

[20]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[21]  Byron Boots,et al.  Reduced-Rank Hidden Markov Models , 2009, AISTATS.

[22]  Le Song,et al.  A Spectral Algorithm for Latent Tree Graphical Models , 2011, ICML.

[23]  Le Song,et al.  Spectral Methods for Learning Multivariate Latent Tree Structure , 2011, NIPS.

[24]  Seth Sullivant,et al.  Identifiability of Two-Tree Mixtures for Group-Based Models , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  Ariadna Quattoni,et al.  Spectral Learning for Non-Deterministic Dependency Parsing , 2012, EACL.

[26]  Karl Stratos,et al.  Spectral Learning of Latent-Variable PCFGs , 2012, ACL.

[27]  Michael Collins,et al.  Spectral Dependency Parsing with Latent Variables , 2012, EMNLP-CoNLL.

[28]  Seth Sullivant,et al.  Identifiability of Large Phylogenetic Mixture Models , 2010, Bulletin of mathematical biology.

[29]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.