Ranking Multidocument Event Descriptions for Building Thematic Timelines

This paper tackles the problem of timeline generation from traditional news sources. Our system builds thematic timelines for a general-domain topic defined by a user query. The system selects and ranks events relevant to the input query. Each event is represented by a one-sentence description in the output timeline. We present an inter-cluster ranking algorithm that takes events from multiple clusters as input and that selects the most salient and relevant events. A cluster, in our work, contains all the events happening in a specific date. Our algorithm utilizes the temporal information derived from a large collection of extensively temporal analyzed texts. Such temporal information is combined with textual contents into an event scoring model in order to rank events based on their salience and query-relevance.

[1]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[2]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[3]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Ricardo Baeza-Yates,et al.  Clustering and exploring search results using timeline constructions , 2009, CIKM.

[6]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[7]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[8]  André Bittar,et al.  Finding Salient Dates for Building Thematic Timelines , 2012, ACL.

[9]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10]  Yan Zhang,et al.  Timeline Generation through Evolutionary Trans-Temporal Summarization , 2011, EMNLP.

[11]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[12]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[13]  Michael Gertz,et al.  Multilingual and cross-domain temporal tagging , 2012, Language Resources and Evaluation.

[14]  Charles Teissèdre,et al.  Detecting Salient Events in Large Corpora by a Combination of NLP and Data Mining Techniques , 2013, CICLing 2013.

[15]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.