Incremental Update Summarization: Adaptive Sentence Selection based on Prevalence and Novelty

The automatic summarization of long-running events from news steams is a challenging problem. A long-running event can contain hundreds of unique 'nuggets' of information to summarize, spread-out over its lifetime. Meanwhile, information reported about it can rapidly become outdated and is often highly redundant. Incremental update summarization (IUS) aims to select sentences from news streams to issue as updates to the user, summarising that event over time. The updates issued should cover all of the key nuggets concisely and before the information contained in those nuggets becomes outdated. Prior summarization approaches when applied to IUS can fail, since they define a fixed summary length that cannot effectively account for the different magnitudes and varying rate of development of such events. In this paper, we propose a novel IUS approach that adaptively alters the volume of content issued as updates over time with respect to the prevalence and novelty of discussions about the event. It incorporates existing state-of-the-art summarization techniques to rank candidate sentences, followed by a supervised regression model that balances novelty, nugget coverage and timeliness when selecting sentences from the top ranks. We empirically evaluate our approach using the TREC 2013 Temporal Summarization dataset extended with additional assessments. Our results show that by adaptively adjusting the number of sentences to select over time, our approach can nearly double the performance of effective summarization baselines.

[1]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[2]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[3]  Ian H. Witten,et al.  Induction of model trees for predicting continuous classes , 1996 .

[4]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[5]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[6]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[7]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[8]  Chin-Yew Lin,et al.  From Single to Multi-document Summarization : A Prototype System and its Evaluation , 2002 .

[9]  Eduard H. Hovy,et al.  From Single to Multi-document Summarization , 2002, ACL.

[10]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.

[11]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[12]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[13]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[14]  Vivi Nastase,et al.  Leveraging DUC , 2006 .

[15]  Ani Nenkova,et al.  A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[16]  Iadh Ounis,et al.  Query performance prediction , 2006, Inf. Syst..

[17]  Michael Gamon,et al.  The PYTHY Summarization System: Microsoft Research at DUC 2007 , 2007 .

[18]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[19]  Stephen Wan,et al.  Experimenting with Clause Segmentation for Text Summarization , 2008, TAC.

[20]  Jin Zhang,et al.  AdaSum: an adaptive model for summarization , 2008, CIKM '08.

[21]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[22]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[23]  Furu Wei,et al.  Query-sensitive mutual reinforcement chain and its application in query-oriented multi-document summarization , 2008, SIGIR '08.

[24]  Alistair Kennedy,et al.  Update Summary Update , 2008, TAC.

[25]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[26]  Qin Lu,et al.  An Integrated Multi-document Summarization Approach based on Word Hierarchical Representation , 2009, ACL/IJCNLP.

[27]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[28]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[29]  Stephen E. Robertson,et al.  Where to stop reading a ranked list?: threshold optimization using truncated score distributions , 2009, SIGIR.

[30]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[31]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[32]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[33]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[34]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[35]  Elad Yom-Tov,et al.  Updating Users about Time Critical Events , 2013, ECIR.

[36]  Yang Liu,et al.  Using Supervised Bigram-based ILP for Extractive Summarization , 2013, ACL.

[37]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.