论文信息 - Evaluating the Coverage of LTAGs on Annotated Corpora

Evaluating the Coverage of LTAGs on Annotated Corpora

Abstract Lexicalized Tree Adjoining Grammars (LTAGs) have been appl ied to many NLP applications. Evaluating the coverage of s LTAG is important for both its developers and it s users. In this paper, we describe a method, which estimates a grammar’s coverage on annotated corpora by first automatically extracting a Treebank grammar from the corpus and then calculating the overlap between the two g rammars. We used the method to test the coverage of the XTAG grammar, which is a large-scale hand-crafted gra mm r for English, on the English Penn Treebank, and the result shows that the grammar can cover at least 97.2% of template tokens in the Treebank. This method has several advantages: first, the whole process is semi-aut om tic and requires little human effort; second, the coverage can be calculated at sentence level or more fine-gra ined levels, third, the method provides a set of new templates that can be added to the grammar to improve its cove rage. Fourth, there is no need to parse the corpus.

Fei Xia | Martha Palmer | Martha Palmer | F. Xia

[1] Ann Bies,et al. The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[2] Aravind K. Joshi,et al. Coordination in Tree Adjoining Grammars: Formalization and Implementation , 1996, COLING.

[3] Martha Palmer,et al. Extracting Tree Adjoining Grammars from Bracketed Corpora , 2009 .

[4] Aravind K. Joshi,et al. Tree-Adjoining Grammars , 1997, Handbook of Formal Languages.

[5] Aravind K. Joshi,et al. Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[6] Rashmi Prasad,et al. Comparing test-suite based evaluation and corpus-based evaluation of a wide-coverage grammar for English , 2001 .

[7] Beth Ann Hockey,et al. XTAG System - A Wide Coverage Grammar for English , 1994, COLING.