Summarizing Search Results using PLSI

In this paper, we investigate generating a set of query-focused summaries from search results. Since there may be many topics related to a given query in the search results, in order to summarize these results, they should first be classified into topics, and then each topic should be summarized individually. In this summarization process, two types of redundancies need to be reduced. First, each topic summary should not contain any redundancy (we refer to this problem as redundancy within a summary). Second, a topic summary should not be similar to any other topic summary (we refer to this problem as redundancy between summaries). In this paper, we focus on the document clustering process and the reduction of redundancy between summaries in the summarization process. We also propose a method using PLSI to summarize search results. Evaluation results confirm that our method performs well in classifying search results and reducing the redundancy between summaries.

[1]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[2]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[3]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[4]  Eduard Hovy,et al.  Evaluating DUC 2005 using Basic Elements , 2005 .

[5]  Sadao Kurohashi,et al.  Web Information Organization Using Keyword Distillation Based Clustering , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[6]  Daisuke Kawahara,et al.  TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology , 2008, IJCNLP.

[7]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[8]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[9]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[10]  Tatsunori Mori,et al.  Multi-Answer-Focused Multi-Document Summarization Using a Question-Answering Engine , 2004, COLING.

[11]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Daisuke Kawahara,et al.  A Large-Scale Web Data Collection as a Natural Language Processing Infrastructure , 2008, LREC.

[15]  Weiguo Fan,et al.  Automatic summarization of search engine hit lists , 2000 .

[16]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[17]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.