Exploring events and distributed representations of text in multi-document summarization

We explore an event detection framework to improve multi-document summarizationWe use distributed representations of text to address different lexical realizationsSummarization is based on the hierarchical combination of single-document summariesWe performed an automatic evaluation and a human study of the generated summariesQuantitative and qualitative results show clear improvements over the state-of-the-art In this article, we explore an event detection framework to improve multi-document summarization. Our approach is based on a two-stage single-document method that extracts a collection of key phrases, which are then used in a centrality-as-relevance passage retrieval model. We explore how to adapt this single-document method for multi-document summarization methods that are able to use event information. The event detection method is based on Fuzzy Fingerprint, which is a supervised method trained on documents with annotated event tags. To cope with the possible usage of different terms to describe the same event, we explore distributed representations of text in the form of word embeddings, which contributed to improve the summarization results. The proposed summarization methods are based on the hierarchical combination of single-document summaries. The automatic evaluation and human study performed show that these methods improve upon current state-of-the-art multi-document summarization systems on two mainstream evaluation datasets, DUC 2007 and TAC 2009. We show a relative improvement in ROUGE-1 scores of 16% for TAC 2009 and of 17% for DUC 2007.

[1]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[2]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[3]  Dat Quoc Nguyen,et al.  Predicting relevant news events for timeline summaries , 2013, WWW.

[4]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[5]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[6]  Teruko Mitamura,et al.  Evaluation for Partial Event Coreference , 2014, EVENTS@ACL.

[7]  Rajendra Akerkar,et al.  Knowledge Based Systems , 2017, Encyclopedia of GIS.

[8]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[9]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[10]  Scott Sanner,et al.  On the mathematical relationship between expected n-call@k and the relevance vs. diversity trade-off , 2012, SIGIR '12.

[11]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[12]  Julia Hirschberg,et al.  Do summaries help? , 2005, SIGIR '05.

[13]  M. Felisa Verdejo,et al.  Events are Not Simple: Identity, Non-Identity, and Quasi-Identity , 2013, EVENTS@NAACL-HLT.

[14]  Giang Binh Tran Structured summarization for news events , 2013, WWW '13 Companion.

[15]  Qin Lu,et al.  Extractive Summarization using Inter- and Intra- Event Relevance , 2006, ACL.

[16]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[17]  Ricardo Ribeiro,et al.  Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity: Extended abstract , 2013, IJCAI.

[18]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[19]  Joe Carthy,et al.  Sentence-level event classification in unstructured texts , 2009, Information Retrieval.

[20]  Jan Snajder,et al.  Event graphs for information retrieval and multi-document summarization , 2014, Expert Syst. Appl..

[21]  Qin Lu,et al.  Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization , 2010, COLING.

[22]  Jaime G. Carbonell,et al.  Self reinforcement for important passage retrieval , 2013, SIGIR.

[23]  Scott Sanner,et al.  Diverse retrieval via greedy optimization of expected 1-call@k in a latent subtopic relevance model , 2011, CIKM '11.

[24]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[25]  Bhiksha Raj,et al.  Privacy-Preserving Important Passage Retrieval , 2014, PIR@SIGIR.

[26]  Bin Ma,et al.  Using Cross-Entity Inference to Improve Event Extraction , 2011, ACL.

[27]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[28]  Rasim M. Alguliyev,et al.  DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization , 2012, Knowl. Based Syst..

[29]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[30]  Yiming Yang,et al.  CMU Approach to TDT-2: Segmentation, Detection, and Tracking , 1999 .

[31]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[32]  Ralph Grishman,et al.  Using Document Level Cross-Event Inference to Improve Event Extraction , 2010, ACL.

[33]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[34]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[35]  Kathleen R. McKeown,et al.  Understanding the process of multi-document summarization: content selection, rewriting and evaluation , 2006 .

[36]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[37]  Qin Lu,et al.  Extractive Summarization Based on Event Term Clustering , 2007, ACL.

[38]  Vadlamani Ravi,et al.  A survey on opinion mining and sentiment analysis: Tasks, approaches and applications , 2015, Knowl. Based Syst..

[39]  Yinglin Wang,et al.  Generating Aspect-oriented Multi-Document Summarization with Event-aspect model , 2011, EMNLP.

[40]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[41]  Jaime G. Carbonell,et al.  Textual Event Detection Using Fuzzy Fingerprints , 2014, IEEE Conf. on Intelligent Systems.

[42]  U. Berkeley Exploring Content Models for Multi-Document Summarization , 2018 .

[43]  Thorsten Joachims,et al.  Temporal corpus summarization using submodular word coverage , 2012, CIKM '12.

[44]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[45]  Scott Sanner,et al.  Probabilistic latent maximal marginal relevance , 2010, SIGIR '10.

[46]  RaviVadlamani,et al.  A survey on opinion mining and sentiment analysis , 2015 .

[47]  J. P. Carvalho,et al.  Authorship identification and author fuzzy “fingerprints” , 2011, 2011 Annual Meeting of the North American Fuzzy Information Processing Society.

[48]  Fuzhen Zhuang,et al.  Exploiting relevance, coverage, and novelty for query-focused multi-document summarization , 2013, Knowl. Based Syst..

[49]  Eduard H. Hovy,et al.  Modeling Newswire Events using Neural Networks for Anomaly Detection , 2014, COLING.

[50]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[51]  Dragomir R. Radev,et al.  Sub-event based multi-document summarization , 2003, HLT-NAACL 2003.

[52]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..