论文信息 - The Syntactically Annotated ICE Corpus and the Automatic Induction of a Formal Grammar

The Syntactically Annotated ICE Corpus and the Automatic Induction of a Formal Grammar

The International Corpus of English is a corpus of national and regional varieties of English. The mega-word British component has been constructed, grammatically tagged, and syntactically parsed. This article is a description of work that aims at the automatic induction of a wide-coverage grammar from this corpus as well as an empirical evaluation of the grammar. It first of all describes the corpus and its annotation schemes and then presents empirical statistics for the grammar. I will then evaluate the coverage and the accuracy of such a grammar when applied automatically in a parsing system. Results show that the grammar enabled the parser to achieve 86.1% recall rate and 83.5% precision rate.

Alex Chengyu Fang

[1] Gerald Nelson,et al. The International Corpus of English , 2002 .

[2] Alex Chengyu Fang,et al. From Cases to Rules and Vice Versa: Robust Practical Parsing With Analogy , 2000, IWPT.

[3] Eugene Charniak,et al. A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[4] Sidney Greenbaum,et al. A new corpus of English: ICE , 1992 .

[5] Daniel Gildea,et al. Corpus Variation and Parser Performance , 2001, EMNLP.

[6] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.