Graph-Based Methods for Multi-document Summarization: Exploring Relationship Maps, Complex Networks and Discourse Information

In this work we investigate the use of graphs for multi-document summarization. We adapt the traditional Relationship Map approach to the multi-document scenario and, in a hybrid approach, we consider adding CST (Cross-document Structure Theory) relations to this adapted model. We also investigate some measures derived from graphs and complex networks for sentence selection. We show that the superficial graph-based methods are promising for the task. More importantly, some of them perform almost as good as a deep approach.

[1]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[2]  Maria Lucía Del Rosario Castro Jorge,et al.  Sumarização automática multidocumento: seleção de conteúdo com base no Modelo CST (Cross-document Structure Theory ) , 2010 .

[3]  Kam-Fai Wong,et al.  Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings , 2005, IJCNLP.

[4]  Maria Lucía del Rosario,et al.  Multi-Document Summarization Using Complex and Rich Features , 2010 .

[5]  Xiaojun Wan,et al.  An Exploration of Document Impact on Graph-Based Multi-Document Summarization , 2008, EMNLP.

[6]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[7]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[8]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[9]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[10]  Zhu Zhang,et al.  Towards CST-enhanced summarization , 2002, AAAI/IAAI.

[11]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[12]  Ani Nenkova,et al.  Discourse indicators for content selection in summarization , 2010, SIGDIAL Conference.

[13]  Erick Galani Maziero,et al.  Multi-Document Discourse Parsing Using Traditional and Hierarchical Machine Learning , 2011, STIL.

[14]  Zhu Zhang,et al.  NewsInEssence: A System For Domain-Independent, Real-Time News Clustering and Multi-Document Summarization , 2001, HLT.

[15]  Maria das Graças Volpe Nunes,et al.  GistSumm: A Summarization Tool Based on a New Extractive Method , 2003, PROPOR.

[16]  Erick Galani Maziero,et al.  CSTNews - A Discourse-Annotated Corpus for Single and Multi-Document Summarization of News Texts in Brazilian Portuguese , 2011 .

[17]  Erick Galani Maziero,et al.  Identifying Multidocument Relations , 2010, NLPCS.

[18]  Thiago Alexandre,et al.  A Generative Approach for Multi-Document Summarization using the Noisy Channel Model , 2011 .

[19]  Vangelis Karkaletsis,et al.  Exploiting Cross-Document Relations for Multi-document Evolving Summarization , 2004, SETN.

[20]  Dragomir R. Radev,et al.  Book Review: Graph-Based Natural Language Processing and Information Retrieval by Rada Mihalcea and Dragomir Radev , 2011, CL.

[21]  Lucas Antiqueira,et al.  Desenvolvimento de técnicas baseadas em redes complexas para sumarização extrativa de textos , 2007 .

[22]  Thiago A. S. Pardo,et al.  Experiments with CST-Based Multidocument Summarization , 2010, TextGraphs@ACL.

[23]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[24]  Thiago Alexandre Salgueiro Pardo,et al.  Métodos para Sumarização Automática Multidocumento Usando Modelos Semântico-Discursivos , 2011 .

[25]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[26]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[27]  Daniel Saraiva Leite Um estudo comparativo de modelos baseados em estatísticas textuais, grafos e aprendizado de máquina para sumarização automática de textos em português , 2010 .

[28]  Gerald Salton,et al.  Automatic text processing , 1988 .

[29]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[30]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[31]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[32]  Dragomir R. Radev A Common Theory of Information Fusion from Multiple Text Sources Step One: Cross-Document Structure , 2000, SIGDIAL Workshop.