The availabil i ty of large, syntactically-bracketed corpora such as the Penn Tree Bank affords us the opportunity to automatically build or train broad-coverage grammars, and in particular t.o train probabilistic grammars. A number of recent parsing experiments have also indicated that. grammars whose production probabilities are dependent on the ,context can be more effective than context-free grammars in selecting a correct parse. To make maximal use of context, we have automatically constructed, from the Penn Tree Bank version 2, a grammar in which the symbols S and NP are the only real non terminals, and the other non-terminals or grammatical nodes are in effect embedded into the right-hand-sides of the S and NP rules. For example, one of the rnles extraded from the tree bank would be S -> NP VBX JJ CC VBX NP [1] ( where NP is a non-terminal and the other symbols are terminals part-of-speech tags of the Tr-ee Bank) . Tbe most common structure in t.he Tree Bank a5sociat.ed with this expansion is (S ·NP (VP (VP VB.I (ADJ J J ) C C (VP VBX NP ) ) ) ) [2] . So i f our parser uses rule [l] j n parsing a sentence, i t. will generate structure [2] for the corresponding part of the sentence. l. sing 94% of the Penn Tree Bank for training, we extracted 32,296 distinct rules (2:3 ,386 for S, and � .910 for NP ) . We also built a smaller version of the grammar based ,on higher fequency patterns for use a5 a back-up when the larger grammar is unable to produce a parse due to memory limitation . We applied this parser to 1 ,989 Wall St1·eet Journal sentences (separate from the training set and with no lirrnt on sentence length) . Of the parsed sentences ( 1 ,899 ) , the percentage of no-crossing sentences is 33:9%, and Parseval recall and precision are 73.43% and 72 .61 %.
[1]
Ted Briscoe,et al.
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
,
1993,
CL.
[2]
John D. Lafferty,et al.
Towards History-based Grammars: Using Richer Models for Probabilistic Parsing
,
1993,
ACL.
[3]
David M. Magerman.
Statistical Decision-Tree Models for Parsing
,
1995,
ACL.
[4]
Ralph Grishman,et al.
Statistical Parsing of Messages
,
1990,
HLT.
[5]
Eric Brill,et al.
Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach
,
1993,
ACL.
[6]
Ralph Grishman,et al.
Generalizing Automatically Generated Selectional Patterns
,
1994,
COLING.
[7]
P MarcusMitchell,et al.
Building a large annotated corpus of English
,
1993
.
[8]
Eric Brill,et al.
Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach
,
1993,
ACL.
[9]
Jeremy J. Carroll,et al.
Linguistic Knowledge Generator
,
1992,
COLING.
[10]
Robert F. Simmons,et al.
The Acquisition and Application of Context Sensitive Grammar for English
,
1991,
ACL.
[11]
David M. Magerman,et al.
Efficiency, Robustness and Accuracy in Picky Chart Parsing
,
1992,
ACL.
[12]
Rens Bod.
Using an Annotated Corpus as a Stochastic Grammar
,
1993,
EACL.
[13]
Roger Garside,et al.
A Probabilistic Parser
,
1985,
EACL.
[14]
Ralph Grishman,et al.
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
,
1991,
HLT.