Discourse Chunking and its Application to Sentence Compression

In this paper we consider the problem of analysing sentence-level discourse structure. We introduce discourse chunking (i.e., the identification of intra-sentential nucleus and satellite spans) as an alternative to full-scale discourse parsing. Our experiments show that the proposed modelling approach yields results comparable to state-of-the-art while exploiting knowledge-lean features and small amounts of discourse annotations. We also demonstrate how discourse chunking can be successfully applied to a sentence compression task.

[1]  Jorn Veenstra Sabine Buchholz Fast NP Chunking Using Memory-Based Learning Techniques , 1998 .

[2]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[5]  Robert J. Gaizauskas,et al.  Event coreference for information extraction , 1997 .

[6]  W. Leung Summarising information , 2002, BMJ.

[7]  Stefan Riezler,et al.  Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar , 2003, NAACL.

[8]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[9]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[10]  Vilson J. Leffa Clause processing in cornplex sentences , 1998 .

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[14]  Daniel Marcu,et al.  To build text summaries of high quality, nuclearity is not sufficient , 1998 .

[15]  Alistair Knott,et al.  A data-driven methodology for motivating a set of coherence relations , 1996 .

[16]  Erik F. Tjong Kim Sang,et al.  Text Chunking by System Combination , 2000, CoNLL/LLL.

[17]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[18]  Jerry R. Hobbs,et al.  Interpretation as Abduction , 1993, Artif. Intell..

[19]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[20]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[21]  Joyce Chai,et al.  Discourse Structure for Context Question Answering , 2004, HLT-NAACL 2004.

[22]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.