Preserve Integrity in Realtime Event Summarization

Online text streams such as Twitter are the major information source for users when they are looking for ongoing events. Realtime event summarization aims to generate and update coherent and concise summaries to describe the state of a given event. Due to the enormous volume of continuously coming texts, realtime event summarization has become the de facto tool to facilitate information acquisition. However, there exists a challenging yet unexplored issue in current text summarization techniques: how to preserve the integrity, i.e., the accuracy and consistency of summaries during the update process. The issue is critical since online text stream is dynamic and conflicting information could spread during the event period. For example, conflicting numbers of death and injuries might be reported after an earthquake. Such misleading information should not appear in the earthquake summary at any timestamp. In this article, we present a novel realtime event summarization framework called IAEA (i.e., Integrity-Aware Extractive-Abstractive realtime event summarization). Our key idea is to integrate an inconsistency detection module into a unified extractive–abstractive framework. In each update, important new tweets are first extracted in an extractive module, and the extraction is refined by explicitly detecting inconsistency between new tweets and previous summaries. The extractive module is able to capture the sentence-level attention which is later used by an abstractive module to obtain the word-level attention. Finally, the word-level attention is leveraged to rephrase words. We conduct comprehensive experiments on real-world datasets. To reduce efforts required for building sufficient training data, we also provide automatic labeling steps of which the effectiveness has been empirically verified. Through experiments, we demonstrate that IAEA can generate better summaries with consistent information than state-of-the-art approaches.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Eric Chu,et al.  MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization , 2018, ICML.

[3]  Na Yeon Lee,et al.  How do journalists leverage Twitter? Expressive and consumptive use of Twitter , 2017 .

[4]  Amanda Dennett,et al.  Social Media for Government Services: A Case Study of Human Services , 2015, Social Media for Government Services.

[5]  Chen Lin,et al.  Realtime Event Summarization from Tweets with Inconsistency Detection , 2018, ER.

[6]  Jing Tang,et al.  NeuralClassifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit , 2019, ACL.

[7]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Edward A. Fox,et al.  Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning , 2020, Data Inf. Manag..

[10]  Mohand Boughanem,et al.  Optimization framework model for retrospective tweet summarization , 2018, SAC.

[11]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[12]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[13]  Niloy Ganguly,et al.  Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters , 2018, ACM Trans. Web.

[14]  Robert J. Gaizauskas,et al.  A Hybrid Approach to Multi-document Summarization of Opinions in Reviews , 2014, INLG.

[15]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[16]  Baoxin Wang,et al.  Disconnected Recurrent Neural Networks for Text Categorization , 2018, ACL.

[17]  Jalal S. Alowibdi,et al.  Post Summarization of Microblogs of Sporting Events , 2017, WWW.

[18]  M. de Rijke,et al.  Sentence Relations for Extractive Summarization with Deep Neural Networks , 2018, ACM Trans. Inf. Syst..

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Asit Kumar Das,et al.  Community Detection Based Tweet Summarization , 2019 .

[21]  Yong Zhang,et al.  Extractive document summarization based on convolutional neural networks , 2016, IECON 2016 - 42nd Annual Conference of the IEEE Industrial Electronics Society.

[22]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[23]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[24]  Vaibhava Goel,et al.  Self-Critical Sequence Training for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ngoc Thanh Nguyen,et al.  A Tweet Summarization Method Based on Maximal Association Rules , 2018, ICCCI.

[26]  Zhi Liu,et al.  LEDS: local event discovery and summarization from tweets , 2016, SIGSPATIAL/GIS.

[27]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[28]  Alysson Neves Bessani,et al.  Cyberthreat Detection from Twitter using Deep Neural Networks , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[29]  Xipeng Qiu,et al.  Recurrent Neural Network for Text Classification with MultiTask Learning , 2016 .

[30]  Yue Zhang,et al.  A Neural Model for Joint Event Detection and Summarization , 2017, IJCAI.

[31]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[32]  Min Sun,et al.  A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss , 2018, ACL.

[33]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[34]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[35]  Shilong Ma,et al.  Continuous Summarization for Microblog Streams Based on Clustering , 2015, ICONIP.

[36]  Mirella Lapata,et al.  Neural Extractive Summarization with Side Information , 2017, ArXiv.

[37]  Dragomir R. Radev,et al.  Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model , 2019, ACL.

[38]  Mirella Lapata,et al.  Hierarchical Transformers for Multi-Document Summarization , 2019, ACL.

[39]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[40]  John Hannon,et al.  Recommending twitter users to follow using content and collaborative filtering approaches , 2010, RecSys '10.

[41]  Richard Socher,et al.  A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.

[42]  Niloy Ganguly,et al.  Extracting Situational Information from Microblogs during Disaster Events: a Classification-Summarization Approach , 2015, CIKM.

[43]  Meng Xu,et al.  Jointly Detecting and Extracting Social Events From Twitter Using Gated BiLSTM-CRF , 2019, IEEE Access.

[44]  Zhenhua Wang,et al.  Sumblr: continuous summarization of evolving tweet streams , 2013, SIGIR.

[45]  Craig MacDonald,et al.  Identifying local events by using microblogs as social sensors , 2013, OAIR.

[46]  Houfeng Wang,et al.  Entity-centric topic-oriented opinion summarization in twitter , 2012, KDD.

[47]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[48]  Danushka Bollegala,et al.  Multi-tweet Summarization of Real-Time Events , 2013, 2013 International Conference on Social Computing.

[49]  Merrin Fabre,et al.  Use of Social Media for Internal Communication: A Case Study in a Government Organisation , 2015, Social Media for Government Services.

[50]  Miles Efron,et al.  Estimation methods for ranking recent information , 2011, SIGIR.

[51]  Sourav S. Bhowmick,et al.  TOTEM: Personal Tweets Summarization on Mobile Devices , 2017, SIGIR.

[52]  Wenpeng Yin,et al.  Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms , 2017, TACL.

[53]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Yong Zhang,et al.  Multiview Convolutional Neural Networks for Multidocument Extractive Summarization , 2017, IEEE Transactions on Cybernetics.

[56]  Tao Li,et al.  Event summarization for sports games using twitter streams , 2017, World Wide Web.

[57]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[58]  Murat Ali Bayir,et al.  Crowd-sourced sensing and collaboration using twitter , 2010, 2010 IEEE International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM).

[59]  Arkaitz Zubiaga,et al.  A longitudinal assessment of the persistence of twitter datasets , 2017, J. Assoc. Inf. Sci. Technol..

[60]  Wei Wang,et al.  Event Detection and Summarization Using Phrase Network , 2017, ECML/PKDD.

[61]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[62]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[63]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[64]  Chen Lin,et al.  Generating event storylines from microblogs , 2012, CIKM.

[65]  Hiroya Takamura,et al.  Summarizing a Document Stream , 2011, ECIR.

[66]  M. Elif Karsligil,et al.  Determination and summarization of important tweets after natural disasters , 2017, 2017 25th Signal Processing and Communications Applications Conference (SIU).

[67]  Yang Liu,et al.  Learning Structured Text Representations , 2017, TACL.

[68]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[69]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[70]  RudraKoustav,et al.  Extracting and Summarizing Situational Information from the Twitter Social Media during Disasters , 2018 .

[71]  Sachin Bojewar,et al.  Tweet analytics and tweet summarization using graph mining , 2017, 2017 International conference of Electronics, Communication and Aerospace Technology (ICECA).

[72]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[73]  Muhammad Imran,et al.  Summarizing Situational Tweets in Crisis Scenario , 2016, HT.

[74]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[75]  Tao Li,et al.  MSSF: a multi-document summarization framework based on submodularity , 2011, SIGIR.