Multi-document Summarization using Tensor Decomposition

The problem of extractive text summarization for a collection of documents is defined as selecting a small subset of sentences so the contents and meaning of the original document set are preserved in the best possible way. In this paper we present a new model for the problem of extractive summarization, where we strive to obtain a summary that preserves the information coverage as much as possible, when compared to the original document set. We construct a new tensor-based representation that describes the given document set in terms of its topics. We then rank topics via Tensor Decomposition, and compile a summary from the sen- tences of the highest ranked topics.

[1]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[2]  Josef Steinberger,et al.  JRC's Participation at TAC 2011: Guided and MultiLingual Summarization Tasks , 2011, TAC.

[3]  Johan Håstad,et al.  Tensor Rank is NP-Complete , 1989, ICALP.

[4]  Gholamreza Ghassem-Sani,et al.  A Multi-Document Multi-Lingual Automatic Summarization System , 2008, IJCNLP.

[5]  Hiroya Takamura,et al.  Text Summarization Model based on Maximum Coverage Problem and its Variant , 2008 .

[6]  Dawid Weiss,et al.  Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition , 2004, Intelligent Information Systems.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[9]  Harold Gulliksen,et al.  Contributions to mathematical psychology , 1964 .

[10]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[11]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[12]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[13]  Hiroya Takamura,et al.  Balanced coverage of aspects for text summarization , 2012, CIKM '12.

[14]  Mirella Lapata,et al.  Automatic Generation of Story Highlights , 2010, ACL.

[15]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[16]  Benoît Favre,et al.  LIF at TAC MultiLing: Towards a Truly Language Independent Summarizer , 2011, TAC.

[17]  Regina Barzilay,et al.  Sentence Ordering in Multidocument Summarization , 2001, HLT.

[18]  George Giannakopoulos,et al.  TAC2011 MultiLing Pilot Overview , 2011, TAC.

[19]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[20]  Takaaki Hasegawa,et al.  Opinion Summarization with Integer Linear Programming Formulation for Sentence Extraction and Ordering , 2010, COLING.

[21]  David Evans,et al.  Similarity-based Multilingual Multi-Document Summarization , 2005 .

[22]  Roland Badeau,et al.  Fast Multilinear Singular Value Decomposition for Structured Tensors , 2008, SIAM J. Matrix Anal. Appl..

[23]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[24]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[25]  Ee-Peng Lim,et al.  Comments-oriented document summarization: understanding documents with readers' feedback , 2008, SIGIR '08.

[26]  Tamara G. Kolda,et al.  MATLAB Tensor Toolbox , 2006 .

[27]  Alberto Flores Rueda,et al.  Computación Y Sistemas , 2022 .

[28]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[29]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[30]  Horacio Saggion,et al.  Multilingual Multidocument Summarization Tools and Evaluation , 2006, LREC.

[31]  T. Gedeon,et al.  Tensor term indexing: An application of HOSVD for document summarization , 2009, 2009 4th International Symposium on Computational Intelligence and Intelligent Informatics.