What are Treebank Grammars

State-of-the-art syntactic disambiguators for natural language employ ”Treebank Grammars”: probabilistic grammars directly projected from annotated corpora (treebanks). Treebank Grammars mark a paradigm shift from the manually constructed, a priori fixed linguistic grammars. In this paper we show that for describing these systems in the framework of Statistical Estimation Theory one must assume an unbounded number of parameters. The unboundedness assumption of Treebank Grammars expresses persistent uncertainty over the formal grammar of natural language. We argue that embracing the unboundedness assumption also brings the justification of smoothing techniques within the scope of Estimation Theory.

[1]  Sven C. Martin,et al.  Statistical Language Modeling Using Leaving-One-Out , 1997 .

[2]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[3]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[4]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[5]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[6]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[7]  Arto Salomaa,et al.  Probabilistic and Weighted Grammars , 1969, Inf. Control..

[8]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[9]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[10]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[11]  Aravind K. Joshi,et al.  Natural language parsing: Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural descriptions? , 1985 .

[12]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13]  John Cocke,et al.  Probabilistic Parsing Method for Sentence Disambiguation , 1989, IWPT.

[14]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .