Rhetorics-based multi-document summarization

In this paper, a new multi-document summarization framework which combines rhetorical roles and corpus-based semantic analysis is proposed. The approach is able to capture the semantic and rhetorical relationships between sentences so as to combine them to produce coherent summaries. Experiments were conducted on datasets extracted from web-based news using standard evaluation methods. Results show the promise of our proposed model as compared to state-of-the-art approaches.

[1]  Rajeev Sangal,et al.  Proceedings of the 20th international joint conference on Artifical intelligence , 2007 .

[2]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[3]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[4]  Joshua Goodman,et al.  Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.

[5]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[6]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[7]  Wei-Ying Ma,et al.  Web-page classification through summarization , 2004, SIGIR '04.

[8]  Kirill Kireyev,et al.  Using Latent Semantic Analysis for Extractive Summarization , 2008, TAC.

[9]  M. Saravanan,et al.  Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment , 2010, Artificial Intelligence and Law.

[10]  Fabio Persia,et al.  Semantic Summarization of Web Documents , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[11]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[12]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[13]  M. Saravanan,et al.  Automatic Identification of Rhetorical Roles using Conditional Random Fields for Legal Document Summarization , 2008, IJCNLP.

[14]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[15]  Adam Jatowt Web page summarization using dynamic content , 2004, WWW Alt. '04.

[16]  Jing Li,et al.  Cleaning Web Pages for Effective Web Content Mining , 2006, DEXA.

[17]  Qiang Yang,et al.  Web-page summarization using clickthrough data , 2005, SIGIR '05.

[18]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.