A Multi-news Timeline Summarization Algorithm Based on Aging Theory

This paper focuses on the problem of news event timeline summary in Multi-Document Summarization, which aims to summarize multi-news regarding the same event in timeline. The majority of the traditional solutions to this problem consider the text surface features and topic-related features, such as the length of each sentence, the position of the sentence in the document, the number of topic words, etc. Traditional methods ignored that every event has its life circle including birth, growth, maturity and death. In this paper, a novel approach is presented for summarizing multi-news regarding the same topic in consideration of both the traditional features and the life circle feature of each event. The proposed approach consists of four steps. First, sentences and their publishing date are extracted from each news article. Second, the extracted sentences are pretreated to reduce the influence of noises like synonyms. Third, life circle features and other four categories of features which are common used in this field are collected. Finally, SVM model is used to train these features to recognize the summary sentence of the news document. This approach have been tested on the public datasets, DUC-2002 and TAC-2010, and the results show that our approach is more effective in summarizing multi-news in timeline than existing methods.

[1]  Nattiya Kanhabua,et al.  Leveraging Learning To Rank in an Optimization Framework for Timeline Summarization , 2013 .

[2]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[3]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[4]  Wolfgang Nejdl,et al.  Temporal summarization of event-related updates in wikipedia , 2013, WWW '13 Companion.

[5]  Yan Zhang,et al.  Timeline Generation through Evolutionary Trans-Temporal Summarization , 2011, EMNLP.

[6]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[9]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[10]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Chien Chin Chen,et al.  Life Cycle Modeling of News Events Using Aging Theory , 2003, ECML.

[12]  Giang Binh Tran Structured summarization for news events , 2013, WWW '13 Companion.

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[15]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[16]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[17]  Dat Quoc Nguyen,et al.  Predicting relevant news events for timeline summaries , 2013, WWW.

[18]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[19]  Ramanujam Nedunchelian Centroid Based Summarization of Multiple Documents Implemented Using Timestamps , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[20]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[21]  Rui Yan,et al.  Timeline generation with social attention , 2013, SIGIR.

[22]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[23]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[24]  Regina Barzilay,et al.  Columbia’s Newsblaster: New Features and Future Directions , 2003, NAACL.