Discursive Sentence Compression

This paper presents a method for automatic summarization by deleting intra-sentence discourse segments. First, each sentence is divided into elementary discourse units and, then, less informative segments are deleted. To analyze the results, we have set up an annotation campaign, thanks to which we have found interesting aspects regarding the elimination of discourse segments as an alternative to sentence compression task. Results show that the degree of disagreement in determining the optimal compressed sentence is high and increases with the complexity of the sentence. However, there is some agreement on the decision to delete discourse segments. The informativeness of each segment is calculated using textual energy, a method that has shown good results in automatic summarization.

[1]  Eric SanJuan,et al.  A New Hybrid Summarizer Based on Vector Space Model, Statistical Physics and Linguistics , 2007, MICAI.

[2]  Alexander Gelbukh,et al.  MICAI 2007: Advances in Artificial Intelligence, 6th Mexican International Conference on Artificial Intelligence, Aguascalientes, Mexico, November 4-10, 2007, Proceedings , 2007, MICAI.

[3]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Maite Taboada,et al.  A Syntactic and Lexical-Based Discourse Segmenter , 2009, ACL.

[6]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7]  Mirella Lapata,et al.  Modelling Compression with Discourse Constraints , 2007, EMNLP.

[8]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[9]  Eric SanJuan,et al.  Textual Energy of Associative Memories: Performant Applications of Enertex Algorithm in Text Summarization and Topic Segmentation , 2007, MICAI.

[10]  Chin-Yew Lin Improving Summarization Performance by Sentence Compression — A Pilot Study , 2003 .

[11]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[12]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[13]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.

[14]  Pascal Denis,et al.  Learning Recursive Segments for Discourse Parsing , 2010, LREC.

[15]  Gerardo Sierra,et al.  Discourse Segmentation for Sentence Compression , 2011, MICAI.

[16]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[17]  Eric SanJuan,et al.  Discourse Segmentation for Spanish Based on Shallow Parsing , 2010, MICAI.

[18]  Juan-Manuel Torres-Moreno,et al.  Résumé automatique de documents : une approche statistique , 2011 .

[19]  Josef Steinberger,et al.  Knowledge-poor Multilingual Sentence Compression , 2007 .

[20]  American National standard for writing abstracts , 1977, IEEE Transactions on Professional Communication.

[21]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[22]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[23]  Gerardo Sierra,et al.  Regroupement sémantique de définitions en espagnol , 2015, ArXiv.

[24]  Sadaoki Furui,et al.  Speech Summarization: An Approach through Word Extraction and a Method for Evaluation , 2004, IEICE Trans. Inf. Syst..

[25]  Lluís Padró,et al.  FreeLing 1.3: Syntactic and semantic services in an open-source NLP library , 2006, LREC.