Real-Time Web Scale Event Summarization Using Sequential Decision Making

We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. "Boston marathon bombing"), our system is able to filter the stream for relevance and produce a series of short text updates describing the event as it unfolds over time. Unlike previous work, our approach is able to jointly model the relevance, comprehensiveness, novelty, and timeliness required by time-sensitive queries. We demonstrate a 28.3% improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics.

[1]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[2]  Craig MacDonald,et al.  Incremental Update Summarization: Adaptive Sentence Selection based on Prevalence and Novelty , 2014, CIKM.

[3]  Eugene Charniak,et al.  Extractive Multi-Document Summaries Should Explicitly Not Contain Document Specific Content , 2011 .

[4]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[5]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[6]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[7]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[8]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[9]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[10]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[11]  Feng Niu,et al.  Building an Entity-Centric Stream Filtering Test Collection for TREC 2012 , 2012, TREC.

[12]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[13]  Fernando Diaz,et al.  Predicting Salient Updates for Disaster Summarization , 2015, ACL.

[14]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[17]  Weiwei Guo,et al.  Weiwei: A Simple Unsupervised Latent Semantics based Approach for Sentence Similarity , 2012, SemEval@NAACL-HLT.

[18]  Jin Zhang,et al.  Decayed DivRank for Guided Summarization , 2011, TAC.

[19]  Donald H. Kraft,et al.  Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , 1998, SIGIR 2002.

[20]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[21]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[22]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[23]  Elad Yom-Tov,et al.  Updating Users about Time Critical Events , 2013, ECIR.

[24]  Tetsuya Sakai,et al.  TREC 2013 Temporal Summarization , 2013, TREC.