Storyline-based summarization for news topic retrospection

Electronics newspapers gradually become main sources for news readers. When facing the numerous reports on a series of events in a topic, a summary of stories from news reports will benefit news readers in reviewing the news topic efficiently. Besides identifying events and presenting news titles and keywords the TDT (Topic Detection and Tracking) techniques are used to do, a summarized text to present event evolution is necessary for general news readers to review events under a news topic. This paper proposes a topic retrospection process and implements the SToRe (Story-line based Topic Retrospection) system that identifies various events under a news topic, and composes a summary that news readers can get the sketch of event evolution in the topic. It consists of three main functions: event identification, main storyline construction and storyline-based summarization. The constructed main storyline can remove the irrelevant events and present a main theme. The storyline-based summarization extracts the representative sentences and takes the main theme as the template to compose the summary. The storyline summary not only provides readers enough information to understand the development of a news topic, but also serves as an index for readers to search corresponding news reports. Following a design science paradigm, a lab experiment is conducted to evaluate the SToRe system in the question-and-answer (Q&A) setting. The experimental results show that SToRe enables news readers to effectively and efficiently capture the evolution of a news topic.

[1]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[2]  Martin Franz,et al.  Unsupervised and supervised clustering for topic tracking , 2001, SIGIR '01.

[3]  Katsumi Tanaka,et al.  Time-based contextualized-news browser (t-cnb) , 2004, WWW Alt. '04.

[4]  Andreas Rauber,et al.  Uncovering hierarchical structure in data using the growing hierarchical self-organizing map , 2002, Neurocomputing.

[5]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[6]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[7]  Lee-Feng Chien,et al.  PAT-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval , 1999, Inf. Process. Manag..

[8]  Eamonn Newman,et al.  A hybrid statistical/linguistic model for generating news story gists , 2004, SIGIR '04.

[9]  Timo Honkela,et al.  Websom for Textual Data Mining , 1999, Artificial Intelligence Review.

[10]  Hal Berghel,et al.  Cyberspace 2000: dealing with information overload , 1997, CACM.

[11]  Michael Grüninger,et al.  Introduction , 2002, CACM.

[12]  Ilze Zigurs,et al.  Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work , 2001 .

[13]  Hsin-Hsi Chen,et al.  A summarization system for Chinese news from multiple sources , 2003, J. Assoc. Inf. Sci. Technol..

[14]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[15]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[16]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[17]  Rong Tang,et al.  Towards an optimal resolution to information overload: an infomediary approach , 2001, GROUP.

[18]  David A. Forsyth,et al.  Towards auto-documentary: tracking the evolution of news stories , 2004, MULTIMEDIA '04.

[19]  Chang-Shing Lee,et al.  Ontology-based fuzzy event extraction agent for Chinese e-news summarization , 2003, Expert Syst. Appl..

[20]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[21]  Ichiro Ide,et al.  Threading news video topics , 2003, MIR '03.

[22]  Naohiko Uramoto,et al.  A Method for Relating Multiple Newspaper Articles by Using Graphs, and Its Application to Webcasting , 1998, COLING-ACL.

[23]  Daniel A. Keim,et al.  Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining , 2002, KDD.

[24]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[25]  Mary Ellen Okurowski,et al.  A Scalable Summarization System Using Robust NLP , 1997 .

[26]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[27]  Hsin-Hsi Chen,et al.  Open-Domain Question Answering on Heterogeneous Data , 2006 .

[28]  Andreas Rauber LabelSOM: on the labeling of self-organizing maps , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[29]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[30]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[31]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[32]  C. J. van Rijsbergen,et al.  The use of hierarchic clustering in information retrieval , 1971, Inf. Storage Retr..

[33]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[34]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[35]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[36]  John Seely Brown,et al.  Book Reviews : The Social Life of Information By John Seely Brown and Paul Duguid. Boston: Harvard Business School Press, 2000. 320 pages , 2000 .

[37]  Bernd Fritzke Growing Grid — a self-organizing network with constant neighborhood range and adaptation strength , 1995, Neural Processing Letters.

[38]  Keh-Jiann Chen,et al.  Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff , 2003, SIGHAN.

[39]  Brian S. Brooks News Reporting and Writing , 1980 .

[40]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[41]  Chih-Ping Wei,et al.  Use of Text Summarization for Supporting Event Detection , 2004, PACIS.

[42]  James Allan,et al.  Extracting significant time varying features from text , 1999, CIKM '99.

[43]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[44]  Jade Goldstein-Stewart,et al.  Creating and evaluating multi-document sentence extract summaries , 2000, CIKM '00.

[45]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[46]  Ravi Kumar,et al.  A graph-theoretic approach to extract storylines from search results , 2004, KDD.

[47]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[48]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.

[49]  Mary Ellen Okurowski,et al.  Trainable, Scalable Summarization Using Robust NLP and Machine Learning , 1998, ACL.

[50]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[51]  Inderjeet Mani Recent developments in text summarization , 2001, CIKM '01.