Probabilistic grammars and hierarchical Dirichlet processes

This article focuses on the use of probabilistic context-free grammars (PCFGs) in natural language processing involving a large-scale natural language parsing task. It describes detailed, highly-structured Bayesian modelling in which model dimension and complexity responds naturally to observed data. The framework, termed hierarchical Dirichlet process probabilistic context-free grammar (HDP-PCFG), involves structured hierarchical Dirichlet process modelling and customized model fitting via variational methods to address the problem of syntactic parsing and the underlying problems of grammar induction and grammar refinement. The central object of study is the parse tree, which can be used to describe a substantial amount of the syntactic structure and relational semantics of natural language sentences. The article first provides an overview of the formal probabilistic specification of the HDP-PCFG, algorithms for posterior inference under the HDP-PCFG, and experiments on grammar learning run on the Wall Street Journal portion of the Penn Treebank.

[1]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[2]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[5]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[6]  F. Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, ACL.

[7]  Glenn Carroll,et al.  Two Experiments on Learning Probabilistic Dependency Grammars from Corpora , 1992 .

[8]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[9]  Andreas Stolcke,et al.  Inducing Probabilistic Grammars by Bayesian Model Merging , 1994, ICGI.

[10]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[11]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[12]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[13]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[14]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[15]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[16]  Ulf Hermjakob,et al.  Parsing and Question Classification for Question Answering , 2001, ACL 2001.

[17]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[18]  H. Ishwaran,et al.  Exact and approximate sum representations for the Dirichlet process , 2002 .

[19]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[20]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[21]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[22]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[23]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[24]  Kenichi Kurihara,et al.  An Application of the Variational Bayesian Approach to Probabilistic Context-Free Grammars , 2004 .

[25]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[26]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[27]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[28]  Noah A. Smith,et al.  Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.

[29]  Yasubumi Sakakibara,et al.  Grammatical inference in bioinformatics , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[31]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[32]  Thomas L. Griffiths,et al.  Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models , 2006, NIPS.

[33]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[34]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[35]  Stephen G. Walker,et al.  Sampling the Dirichlet Mixture Model with Slices , 2006, Commun. Stat. Simul. Comput..

[36]  G. Roberts,et al.  Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models , 2007, 0710.4228.

[37]  Thomas Hofmann,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2007 .

[38]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[39]  Christopher D. Manning,et al.  The Infinite Tree , 2007, ACL.

[40]  Dan Klein,et al.  Learning and Inference for Hierarchically Split PCFGs , 2007, AAAI.

[41]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[42]  Jean-Christophe Nebel,et al.  A probabilistic context-free grammar for the detection of binding sites from a protein sequence , 2007, BMC Systems Biology.

[43]  Phil Blunsom,et al.  Bayesian Synchronous Grammar Induction , 2008, NIPS.

[44]  A. Gelfand,et al.  The Nested Dirichlet Process , 2008 .

[45]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[46]  John DeNero,et al.  Sampling Alignment Structure under a Bayesian Translation Model , 2008, EMNLP.