A Note on the Expressive Power of Probabilistic Context Free Grammars

We examine the expressive power of probabilistic context free grammars (PCFGs), with a special focus on the use of probabilities as a mechanism for reducing ambiguity by filtering out unwanted parses. Probabilities in PCFGs induce an ordering relation among the set of trees that yield a given input sentence. PCFG parsers return the trees bearing the maximum probability for a given sentence, discarding all other possible trees. This mechanism is naturally viewed as a way of defining a new class of tree languages. We formalize the tree language thus defined, study its expressive power, and show that the latter is beyond context freeness. While the increased expressive power offered by PCFGs helps to reduce ambiguity, we show that, in general, it cannot be decided whether a PCFG removes all ambiguities.

[1]  James Jay Horning,et al.  A study of grammatical inference , 1969 .

[2]  Taylor L. Booth,et al.  Applying Probability Measures to Abstract Languages , 1973, IEEE Transactions on Computers.

[3]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[4]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[5]  C. S. Wetherell,et al.  Probabilistic Languages: A Review and Some Open Questions , 1980, CSUR.

[6]  A. N. V. Rao,et al.  Approximating grammar probabilities: solution of a conjecture , 1986, JACM.

[7]  J. Berstel,et al.  Context-free languages , 1993, SIGA.

[8]  Eugene Charniak,et al.  Parsing with Context-Free Grammars and Word Statistics , 1995 .

[9]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[10]  Judith L. Klavans,et al.  Book Reviews: The Balancing Act: Combining Symbolic and Statistical Approaches to Language , 1997, CL.

[11]  Klaus Wich Exponential ambiguity of context-free grammars , 1999, Developments in Language Theory.

[12]  Jason Eisner,et al.  Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[13]  Mehryar Mohri,et al.  Context-Free Recognition with Weighted Automata , 2000, Grammars.

[14]  Klaus Wich Characterization of Context-Free Languages with Polynomially Bounded Ambiguity , 2001, MFCS.

[15]  Steven Abney,et al.  Statistical Methods and Linguistics , 2002 .

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[18]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[19]  Khalil Sima'an,et al.  Data-Oriented Parsing , 2003 .

[20]  Maarten de Rijke,et al.  Alternative approaches for Generating Bodies of Grammar Rules , 2004, ACL.

[21]  G. G. Infante Lopez,et al.  Two-level probabilistic grammars for natural language parsing , 2005 .