Building a Language Model for Local Coherence in Multi-document Summaries Using a Discourse-Enriched Entity-Based Model

Local Coherence is a very important aspect in multi-document summarization, since good summaries not only condense the most relevant information, but also present it in a well-organized structure. One of the most investigated models for local coherence is the Entity-based model, which has been successfully used, once it facilitates the computational approach for coherence measurement. Particularly, this model was used for the evaluation of local coherence in multi-document summaries, achieving promising results. In order to improve the potential of the Entity-based model, we propose the creation of a language model for multi-document summaries that integrates the Entity-based model with discourse knowledge, mainly from Cross-document Structure Theory. Our results show that this type of information enriches the Entity-based Model by capturing other phenomena that are inherent to multi-document summaries, such as redundancy and complementarily, which improves the performance of the original model.

[1]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[2]  Maria das Graças Volpe Nunes,et al.  A comprehensive comparative evaluation of RST-based summarization methods , 2010, TSLP.

[3]  Dragomir R. Radev,et al.  Revisions that improve cohesion in multi-document summaries: a preliminary study , 2002, ACL 2002.

[4]  Dragomir R. Radev A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.

[5]  Livio Robaldo,et al.  The Penn Discourse TreeBank 2.0. , 2008, LREC.

[6]  Peter W. Foltz,et al.  Textual coherence using latent semantic analysis , 1998 .

[7]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[8]  Bonnie L. Webber,et al.  D-LTAG: extending lexicalized TAG to discourse , 2004, Cogn. Sci..

[9]  Hwee Tou Ng,et al.  Automatically Evaluating Text Coherence Using Discourse Relations , 2011, ACL.

[10]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[11]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[12]  Michael Strube,et al.  Extending the Entity-grid Coherence Model to Semantically Related Entities , 2007, ENLG.

[13]  Valéria Delisandra Feltrim,et al.  Análise Automática de Coerência Usando o Modelo Grade de Entidades para o Português (Automatic Coherence Analysis Using the Entity-grid Model for Portuguese) [in Portuguese] , 2013, STIL.

[14]  Thiago Alexandre Salgueiro Pardo,et al.  DMSumm: Review and Assessment , 2002, PorTAL.

[15]  Seiji Miike,et al.  Abstract Generation Based on Rhetorical Structure Extraction , 1994, COLING.

[16]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[17]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[18]  Stergos D. Afantenos Some Reflections on the Task of Content Determination in the Context of Multi-Document Summarization of Evolving Events , 2007, ArXiv.

[19]  Joel R. Tetreault,et al.  Using Entity-Based Features to Model Coherence in Student Essays , 2010, HLT-NAACL.

[20]  Takenobu Tokunaga,et al.  A Metric for Evaluating Discourse Coherence based on Coreference Resolution , 2012, COLING.

[21]  Thiago A. S. Pardo,et al.  Experiments with CST-Based Multidocument Summarization , 2010, TextGraphs@ACL.

[22]  Mick O'Donnell Variable Length On-Line Document Generation , 1997 .

[23]  Ingedore Grunfeld Villaça Koch,et al.  A coerência textual , 2002 .

[24]  Erick Galani Maziero,et al.  CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese , 2011 .

[25]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..