Sentence Compression in Spanish driven by Discourse Segmentation and Language Models

Previous works demonstrated that Automatic Text Summarization (ATS) by sentences extraction may be improved using sentence compression. In this work we present a sentence compressions approach guided by level-sentence discourse segmentation and probabilistic language models (LM). The results presented here show that the proposed solution is able to generate coherent summaries with grammatical compressed sentences. The approach is simple enough to be transposed into other languages.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[3]  Josef Steinberger,et al.  Knowledge-poor Multilingual Sentence Compression , 2007 .

[4]  Gerardo Sierra,et al.  Discourse Segmentation for Sentence Compression , 2011, MICAI.

[5]  American National standard for writing abstracts , 1977, IEEE Transactions on Professional Communication.

[6]  Maite Taboada,et al.  A Syntactic and Lexical-Based Discourse Segmenter , 2009, ACL.

[7]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[8]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[9]  Juan-Manuel Torres-Moreno,et al.  Résumé automatique de documents : une approche statistique , 2011 .

[10]  Iria da Cunha,et al.  La compresión de frases: un recurso para la optimización de resumen automático de documentos , 2010, Linguamática.

[11]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[12]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[13]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[14]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.

[15]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[16]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[17]  Ani Nenkova,et al.  Automatic Evaluation of Linguistic Quality in Multi-Document Summarization , 2010, ACL.

[18]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  Eric SanJuan,et al.  Discourse Segmentation for Spanish Based on Shallow Parsing , 2010, MICAI.

[21]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[22]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[23]  Eric SanJuan,et al.  Summary Evaluation with and without References , 2010, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[24]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.