Intertopic information mining for query‐based summarization

In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning‐based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best‐performing DUC participating systems on the same data set.

[1]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[2]  Wenjie Li,et al.  Developing learning strategies for topic-based summarization , 2007, CIKM '07.

[3]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[4]  Xiaojun Wan,et al.  Using Cross-Document Random Walks for Topic-Focused Multi-Document , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[5]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[6]  Lucy Vanderwende,et al.  Microsoft Research at DUC2006: Task-Focused Summarization with Sentence Simplification and Lexical Expansion , 2006 .

[7]  Vasudeva Varma,et al.  Query Independent Sentence Scoring approach to DUC 2006 , 2006 .

[8]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[9]  Liang Zhou,et al.  A Web-Trained Extraction Summarization System , 2003, NAACL.

[10]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[11]  Yuji Matsumoto,et al.  Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[12]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[13]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[14]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[15]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[16]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[17]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[18]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[19]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..