Abstract Lexicalized Tree Adjoining Grammars (LTAGs) have been appl ied to many NLP applications. Evaluating the coverage of s LTAG is important for both its developers and it s users. In this paper, we describe a method, which estimates a grammar’s coverage on annotated corpora by first automatically extracting a Treebank grammar from the corpus and then calculating the overlap between the two g rammars. We used the method to test the coverage of the XTAG grammar, which is a large-scale hand-crafted gra mm r for English, on the English Penn Treebank, and the result shows that the grammar can cover at least 97.2% of template tokens in the Treebank. This method has several advantages: first, the whole process is semi-aut om tic and requires little human effort; second, the coverage can be calculated at sentence level or more fine-gra ined levels, third, the method provides a set of new templates that can be added to the grammar to improve its cove rage. Fourth, there is no need to parse the corpus.
Ann Bies,et al.
The Penn Treebank: Annotating Predicate Argument Structure
Aravind K. Joshi,et al.
Coordination in Tree Adjoining Grammars: Formalization and Implementation
Martha Palmer,et al.
Extracting Tree Adjoining Grammars from Bracketed Corpora
Aravind K. Joshi,et al.
Tree-Adjoining Grammars
Handbook of Formal Languages.
Aravind K. Joshi,et al.
Tree Adjunct Grammars
J. Comput. Syst. Sci..
Rashmi Prasad,et al.
Comparing test-suite based evaluation and corpus-based evaluation of a wide-coverage grammar for English
Beth Ann Hockey,et al.
XTAG System - A Wide Coverage Grammar for English