TSCAN: a novel method for topic summarization and content anatomy

A topic is defined as a seminal event or activity along with all directly related events and activities. It is represented as a chronological sequence of documents by different authors published on the Internet. In this paper, we define a task called topic anatomy, which summarizes and associates core parts of a topic graphically so that readers can understand the content easily. The proposed topic anatomy model, called TSCAN, derives the major themes of a topic from the eigenvectors of a temporal block association matrix. Then, the significant events of the themes and their summaries are extracted by examining the constitution of the eigenvectors. Finally, the extracted events are associated through their temporal closeness and context similarity to form the evolution graph of the topic. Experiments based on the official TDT4 corpus demonstrate that the generated evolution graphs comprehensibly describe the storylines of topics. Moreover, in terms of content coverage and consistency, the produced summaries are superior to those of other summarization methods based on human composed reference summaries.

[1]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[2]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[3]  Ani Nenkova,et al.  Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference , 2005, AAAI.

[4]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[5]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[6]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[7]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[8]  Ming-Syan Chen,et al.  LIPED: HMM-based life profiles for adaptive event detection , 2005, KDD '05.

[9]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[10]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[11]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[12]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[13]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[14]  Stephen H. Friedberg,et al.  Elementary Linear Algebra: A Matrix Approach , 1999 .

[15]  Christopher C. Yang,et al.  Discovering event evolution graphs from newswires , 2006, WWW '06.

[16]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[17]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.