On using a quantum physics formalism for multidocument summarization

Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes. In this article, we propose a major extension of these techniques that relies on the quantum information access (QIA) framework. The latter is a framework developed for modeling information access based on the probabilistic formalism of quantum physics. The QIA framework not only points out the limitations of the current LSA-based approaches, but motivates a new principled criterium to tackle multidocument summarization that addresses these limitations. As a byproduct, it also provides a way to enhance the LSA-based approaches. Extensive experiments on the DUC 2005, 2006 and 2007 datasets show that the proposed approach consistently improves over both the LSA-based approaches and the systems that competed in the yearly DUC competitions. This demonstrates the potential impact of quantum-inspired approaches to information access in general, and of the QIA framework in particular. © 2012 Wiley Periodicals, Inc.

[1]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[2]  Tao Tao,et al.  Diagnostic Evaluation of Information Retrieval Models , 2011, TOIS.

[3]  C. J. van Rijsbergen,et al.  Semantic Spaces: Measuring the Distance between Different Subspaces , 2009, QI.

[4]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[5]  Simone Teufel,et al.  Examining the consensus between human summaries: initial experiments with factoid analysis , 2003, HLT-NAACL 2003.

[6]  Massimo Melucci,et al.  A basis for information retrieval in context , 2008, TOIS.

[7]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[8]  Guido Zuccon,et al.  Using the Quantum Probability Ranking Principle to Rank Interdependent Documents , 2010, ECIR.

[9]  Julia Hirschberg,et al.  Do summaries help? , 2005, SIGIR '05.

[10]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[11]  Ferda Nur Alpaslan,et al.  Text Summarization of Turkish Texts using Latent Semantic Analysis , 2010, COLING.

[12]  Einat Amitay,et al.  Trends, fashions, patterns, norms, conventions . . . and hypertext too , 2001, J. Assoc. Inf. Sci. Technol..

[13]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[14]  Rada Mihalcea,et al.  Language Independent Extractive Summarization , 2005, ACL.

[15]  John M. Conroy,et al.  Back to Basics: CLASSY 2006 , 2006 .

[16]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[17]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[18]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[19]  Vasudeva Varma,et al.  Query Independent Sentence Scoring approach to DUC 2006 , 2006 .

[20]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[21]  G. Sampath,et al.  A Multilevel Text Processing Model of Newsgroup Dynamics , 2002, NLDB.

[22]  Tsutomu Hirao An Extrinsic Evaluation for Question-Biased Text Summarization on QA tasks , 2001 .

[23]  C. J. van Rijsbergen,et al.  What can quantum theory bring to information retrieval , 2010, CIKM.

[24]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[25]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[26]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[27]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[28]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[29]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[30]  Chris Buckley,et al.  Automatic Text Summarization by Paragraph Extraction , 1997 .

[31]  C. J. van Rijsbergen,et al.  Eraser Lattices and Semantic Contents , 2009, QI.

[32]  Andrew Trotman,et al.  Sound and complete relevance assessment for XML retrieval , 2008, TOIS.

[33]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[34]  J. Steinberger,et al.  Using Latent Semantic Analysis in Text Summarization and Summary Evaluation , 2004 .

[35]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[36]  Guido Zuccon,et al.  On the use of Complex Numbers in Quantum Models for Information Retrieval , 2011, ICTIR.

[37]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[38]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[39]  Xiao-Long Wang,et al.  Multi-document summarization based on lexical chains , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[40]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[41]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[42]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[43]  Jing Li,et al.  A Lexical Chain Approach for Update-Style Query-Focused Multi-document Summarization , 2008, AIRS.

[44]  Massih-Reza Amini,et al.  Transductive learning over automatically detected themes for multi-document summarization , 2011, SIGIR '11.

[45]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[46]  G. Reinelt The traveling salesman: computational solutions for TSP applications , 1994 .