Discourse Data in DiET
暂无分享,去创建一个
The DiET project provides systematically constructed and annotated test items and associated tools, enabling fast system debugging and evaluation, and automatic linkage from test items to real corpora instances. This paper concentrates on the discourse test suite and its use. The discourse test suite covers discourse phenomena such as pronouns, def-inites and ellipsis. These can be used to evaluate the coverage and accuracy of implementations of anaphora resolution algorithms. We also examine the text prooling support within the Diet tools. Text Prooling identiies typical and salient corpus characteristics, e.g. the frequency and distribution of part of speech tags and vocabulary richness. Prooling also provides candidate sentences instantiating predeened syntactic phenomena. Prooling enables users to select test-items appropriate to their domain speciic corpus. The paper shows how the corpus search engine can be used to identify discourse phenomena in a corpus and presents concrete results of this evaluation scenario.
[1] Sabine Lehmann,et al. Towards a theory of syntactic phenomena , 2001 .
[2] Jan Svartvik,et al. A __ comprehensive grammar of the English language , 1988 .
[3] Robert Gaizauskas,et al. Quantitative evaluation of coreference algorithms in an information extraction system , 2000 .
[4] Branimir Boguraev,et al. Anaphora for Everyone: Pronominal Anaphora Resolution without a Parser , 1996, COLING.