The effect of non-tightness on Bayesian estimation of PCFGs

Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.

[1]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[4]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[5]  Kenichi Kurihara,et al.  Variational Bayesian Grammar Induction for Natural Language , 2006, ICGI.

[6]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[7]  Mikko Alava,et al.  Branching Processes , 2009, Encyclopedia of Complexity and Systems Science.

[8]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[9]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[10]  Giorgio Satta,et al.  Computing Partition Functions of PCFGs , 2009, Research on Language and Computation.

[11]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[12]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[13]  Noah A. Smith,et al.  Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning , 2012, CL.

[14]  李幼升,et al.  Ph , 1989 .

[15]  M. Degroot,et al.  Probability and Statistics , 2021, Examining an Operational Approach to Teaching Probability.

[16]  Zhiyi Chi,et al.  Estimation of Probabilistic Context-Free Grammars , 1998, Comput. Linguistics.

[17]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.