论文信息 - Deriving Rhetorical Complexity Data from the RST-DT Corpus

Deriving Rhetorical Complexity Data from the RST-DT Corpus

This paper describes a study of the levels at which different rhetorical relations occur in rhetorical structure trees. In a previous empirical study (Williams and Reiter, 2003) of the RST-DT (Rhetorical Structure Theory Discourse Treebank) Corpus (Carlson et al., 2003), we noticed that certain rhetorical relations tended to occur more frequently at higher levels in a rhetorical structure tree, whereas others seemed to occur more often at lower levels. The present study takes a closer look at the data, partly to test this observation, and partly to investigate related issues such as the relative complexity of satellite and nucleus for each type of relation. One practical application of this investigation would be to guide discourse planning in Natural Language Generation (NLG), so that it reflects more accurately the structures found in documents written by human authors. We present our preliminary findings and discuss their relevance for discourse planning.

Richard Power | Sandra Williams

[1] Alex Lascarides,et al. Combining Hierarchical Clustering and Machine Learning to Predict High-Level Discourse Structure , 2004, COLING.

[2] Ehud Reiter,et al. A corpus analysis of discourse relations for Natural Language Generation , 2003 .

[3] Donia Scott,et al. Document Structure , 2003, CL.

[4] Daniel Marcu,et al. An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[5] G. Meade. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001 .

[6] C. Mellish,et al. ILEX: an architecture for a dynamic hypertext generation system , 2001, Natural Language Engineering.

[7] Chris Mellish,et al. ILEX: an architecture for a dynamic hypertext generation system , 2001, Nat. Lang. Eng..