Abstract Lexicalized Tree Adjoining Grammars (LTAGs) have been appl ied to many NLP applications. Evaluating the coverage of s LTAG is important for both its developers and it s users. In this paper, we describe a method, which estimates a grammar’s coverage on annotated corpora by first automatically extracting a Treebank grammar from the corpus and then calculating the overlap between the two g rammars. We used the method to test the coverage of the XTAG grammar, which is a large-scale hand-crafted gra mm r for English, on the English Penn Treebank, and the result shows that the grammar can cover at least 97.2% of template tokens in the Treebank. This method has several advantages: first, the whole process is semi-aut om tic and requires little human effort; second, the coverage can be calculated at sentence level or more fine-gra ined levels, third, the method provides a set of new templates that can be added to the grammar to improve its cove rage. Fourth, there is no need to parse the corpus.
[1]
Ann Bies,et al.
The Penn Treebank: Annotating Predicate Argument Structure
,
1994,
HLT.
[2]
Aravind K. Joshi,et al.
Coordination in Tree Adjoining Grammars: Formalization and Implementation
,
1996,
COLING.
[3]
Martha Palmer,et al.
Extracting Tree Adjoining Grammars from Bracketed Corpora
,
2009
.
[4]
Aravind K. Joshi,et al.
Tree-Adjoining Grammars
,
1997,
Handbook of Formal Languages.
[5]
Aravind K. Joshi,et al.
Tree Adjunct Grammars
,
1975,
J. Comput. Syst. Sci..
[6]
Rashmi Prasad,et al.
Comparing test-suite based evaluation and corpus-based evaluation of a wide-coverage grammar for English
,
2001
.
[7]
Beth Ann Hockey,et al.
XTAG System - A Wide Coverage Grammar for English
,
1994,
COLING.