Explicit Diversification of Event Aspects for Temporal Summarization

During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness.

[1]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[2]  Claudia Niederée,et al.  Balancing Novelty and Salience: Adaptive Learning to Rank Entities for Timeline Summarization of High-impact Events , 2015, CIKM.

[3]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[4]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[5]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[6]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[7]  Maria Vargas-Vera,et al.  Event Recognition on News Stories and Semi-Automatic Population of an Ontology , 2004, IEEE/WIC/ACM International Conference on Web Intelligence (WI'04).

[8]  Chunyun Zhang,et al.  A multi-level system for sequential update summarization , 2015, 2015 11th International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness (QSHINE).

[9]  Douglas W. Oard,et al.  HLTCOE at TREC 2013: Temporal Summarization , 2013, TREC.

[10]  Starr Roxanne Hiltz,et al.  Dealing with information overload when using social media for emergency management: Emerging solutions , 2013, ISCRAM.

[11]  Peng Xu,et al.  Generating Breakpoint-based Timeline Overview for News Topic Retrospection , 2011, 2011 IEEE 11th International Conference on Data Mining.

[12]  Craig MacDonald,et al.  Search Result Diversification , 2015, Found. Trends Inf. Retr..

[13]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[14]  Fernando Diaz,et al.  Predicting Salient Updates for Disaster Summarization , 2015, ACL.

[15]  Eelco Herder,et al.  Timeline Summarization from Relevant Headlines , 2015, ECIR.

[16]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[17]  Yinglin Wang,et al.  Generating Aspect-oriented Multi-Document Summarization with Event-aspect model , 2011, EMNLP.

[18]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[19]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[20]  Rodrygo L. T. Santos Explicit web search result diversification , 2013, SIGF.

[21]  M. Osborne,et al.  Bieber no more : First Story Detection using Twitter and Wikipedia , 2012 .

[22]  Yue Liu,et al.  ICTNET at Web Track 2013 , 2013, TREC.

[23]  Enhong Chen,et al.  Context-aware query suggestion by mining click-through and session data , 2008, KDD.

[24]  Nattiya Kanhabua,et al.  Leveraging Dynamic Query Subtopics for Time-Aware Search Result Diversification , 2014, ECIR.

[25]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[26]  Jun Wang,et al.  Top-k Retrieval Using Facility Location Analysis , 2012, ECIR.

[27]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[28]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[29]  Furu Wei,et al.  Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization , 2008, SIGIR '08.

[30]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[31]  Craig MacDonald,et al.  Incremental Update Summarization: Adaptive Sentence Selection based on Prevalence and Novelty , 2014, CIKM.

[32]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[33]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[34]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[35]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[36]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[37]  Kathleen McKeown,et al.  Real-Time Web Scale Event Summarization Using Sequential Decision Making , 2016, IJCAI.

[38]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[39]  Claire Cardie,et al.  Timeline generation: tracking individuals on twitter , 2013, WWW.

[40]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[41]  Craig MacDonald,et al.  From Puppy to Maturity: Experiences in Developing Terrier , 2012, OSIR@SIGIR.

[42]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[43]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[44]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[45]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[46]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[47]  Yu Huang,et al.  Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms , 2011, INTERSPEECH.

[48]  Jimmy J. Lin,et al.  A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries , 2017, CIKM.

[49]  John D. Lafferty,et al.  A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval , 2017, SIGF.

[50]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[51]  Elad Yom-Tov,et al.  Updating Users about Time Critical Events , 2013, ECIR.

[52]  Qian Liu,et al.  ICTNET at Temporal Summarization Track TREC 2013 , 2013, TREC.