Discourse Segmentation for Sentence Compression

Earlier studies have raised the possibility of summarizing at the level of the sentence. This simplification should help in adapting textual content in a limited space. Therefore, sentence compression is an important resource for automatic summarization systems. However, there are few studies that consider sentence-level discourse segmentation for compression task; to our knowledge, none in Spanish. In this paper, we study the relationship between discourse segmentation and compression for sentences in Spanish. We use a discourse segmenter and observe to what extent the passages deleted by annotators fit in discourse structures detected by the system. The main idea is to verify whether the automatic discourse segmentation can serve as a basis in the identification of segments to be eliminated in the sentence compression task. We show that discourse segmentation could be a first solid step towards a sentence compression system.

[1]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[2]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[3]  Chin-Yew Lin Improving Summarization Performance by Sentence Compression — A Pilot Study , 2003 .

[4]  Maite Taboada,et al.  A Syntactic and Lexical-Based Discourse Segmenter , 2009, ACL.

[5]  Emily Pitler,et al.  Methods for Sentence Compression , 2010 .

[6]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[7]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[8]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.

[9]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[10]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[11]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[12]  Pascal Denis,et al.  Learning Recursive Segments for Discourse Parsing , 2010, LREC.

[13]  Kathleen McKeown,et al.  Lexicalized Markov Grammars for Sentence Compression , 2007, NAACL.

[14]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[15]  Juan-Manuel Torres-Moreno,et al.  Compression entropique de phrases contrôlée par un perceptron , 2008 .

[16]  Iria da Cunha,et al.  La compresión de frases: un recurso para la optimización de resumen automático de documentos , 2010, Linguamática.

[17]  S. Dumais Latent Semantic Analysis. , 2005 .

[18]  Eric SanJuan,et al.  Discourse Segmentation for Spanish Based on Shallow Parsing , 2010, MICAI.

[19]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[20]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[21]  Cunha Fanego,et al.  Hacia un modelo lingüístico de resumen automático de artículos médicos en español , 2008 .

[22]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[23]  J. Clarke,et al.  Global inference for sentence compression : an integer linear programming approach , 2008, J. Artif. Intell. Res..

[24]  Mirella Lapata,et al.  Modelling Compression with Discourse Constraints , 2007, EMNLP.

[25]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[26]  Renata Pontin de Mattos Fortes,et al.  Towards Brazilian Portuguese automatic text simplification systems , 2008, DocEng '08.

[27]  Hongyan Jing,et al.  Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[28]  J. Portolés Marcadores del discurso , 2001 .