Consistent unsupervised estimators for anchored PCFGs

Learning probabilistic context-free grammars (PCFGs) from strings is a classic problem in computational linguistics since Horning (1969). Here we present an algorithm based on distributional learning that is a consistent estimator for a large class of PCFGs that satisfy certain natural conditions including being anchored (Stratos et al., 2016). We proceed via a reparameterization of (top–down) PCFGs that we call a bottom–up weighted context-free grammar. We show that if the grammar is anchored and satisfies additional restrictions on its ambiguity, then the parameters can be directly related to distributional properties of the anchoring strings; we show the asymptotic correctness of a naive estimator and present some simulations using synthetic data that show that algorithms based on this approach have good finite sample behavior.

[1]  Ryo Yoshinaka,et al.  Distributional Learning of Context-Free and Multiple Context-Free Grammars , 2016 .

[2]  Lillian Lee,et al.  Learning of Context-Free Languages: A Survey of the Literature , 1996 .

[3]  A. Rényi On Measures of Entropy and Information , 1961 .

[4]  Kousha Etessami,et al.  Polynomial Time Algorithms for Multi-Type Branching Processes and Stochastic Context-Free Grammars , 2012, ArXiv.

[5]  Kousha Etessami,et al.  Polynomial time algorithms for multi-type branching processesand stochastic context-free grammars , 2012, STOC '12.

[6]  Aurélien Lemay,et al.  Learning regular languages using RFSAs , 2004, Theor. Comput. Sci..

[7]  Vladimir Solmon,et al.  The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[8]  Steve Young,et al.  Applications of stochastic context-free grammars using the Inside-Outside algorithm , 1990 .

[9]  Giorgio Satta,et al.  Computing Partition Functions of PCFGs , 2008 .

[10]  Nathaniel J. Smith,et al.  Bootstrapping language acquisition , 2017, Cognition.

[11]  Pieter W. Adriaans,et al.  Learning Shallow Context-free Languages under Simple Distributions , 2001 .

[12]  Sham M. Kakade,et al.  Identifiability and Unmixing of Latent Parse Trees , 2012, NIPS.

[13]  Zellig S. Harris,et al.  From Phoneme to Morpheme , 1955 .

[14]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[15]  Alexander Clark,et al.  Consistent unsupervised estimators for anchored PCFGs , 2021 .

[16]  Zhiyi Chi,et al.  Statistical Properties of Probabilistic Context-Free Grammars , 1999, CL.

[17]  Noah A. Smith,et al.  Weighted and Probabilistic Context-Free Grammars Are Equally Expressive , 2007, CL.

[18]  Ryo Yoshinaka,et al.  Probabilistic learnability of context-free grammars with basic distributional properties from positive examples , 2016, Theor. Comput. Sci..

[19]  Jason Eisner,et al.  Inside-Outside and Forward-Backward Algorithms Are Just Backprop (tutorial paper) , 2016, SPNLP@EMNLP.

[20]  Noah A. Smith,et al.  Empirical Risk Minimization for Probabilistic Grammars: Sample Complexity and Hardness of Learning , 2012, CL.

[21]  J. Baker Trainable grammars for speech recognition , 1979 .

[22]  Sandra E. Hutchins Moments of string and derivation lengths of stochastic context-free grammars , 1972, Inf. Sci..

[23]  Morten H. Christiansen,et al.  Language Learning as Language Use: A Cross-Linguistic Model of Child Language Development , 2019, Psychological review.

[24]  C. de Marcken On the Unsupervised Induction of Phrase-Structure Grammars , 1999 .

[25]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[26]  Fernando Pereira,et al.  Inside-Outside Reestimation From Partially Bracketed Corpora , 1992, HLT.

[27]  R N Aslin,et al.  Statistical Learning by 8-Month-Old Infants , 1996, Science.

[28]  Karl Stratos,et al.  Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models , 2016, TACL.

[29]  Tadao Kasami,et al.  On Multiple Context-Free Grammars , 1991, Theor. Comput. Sci..

[30]  Lisa Pearl,et al.  Experimental Syntax and Island Effects: Computational models of acquisition for islands , 2013 .

[31]  Michelle A. Hollander,et al.  Affectedness and direct objects: The role of lexical semantics in the acquisition of verb argument structure , 1991, Cognition.

[32]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..