Multi-document Summarization Based on Atomic Semantic Events and Their Temporal Relationships

Automatic multi-document summarization (MDS) is the process of extracting the most important information, such as events and entities, from multiple natural language texts focused on the same topic. In this paper, we experiment with the effects of different groups of information such as events and named entities in the domain of generic and update MDS. Our generic MDS system has outperformed the best recent generic MDS systems in DUC 2004 in terms of ROUGE-1 recall and \(f_1\)-measure. Update summarization is a new form of MDS, where novel yet salient sentences are chosen as summary sentences based on the assumption that the user has already read a given set of documents. We present an event based update summarization where the novelty is detected based on the temporal ordering of events, and the saliency is ensured by the event and entity distribution. To our knowledge, no other study has deeply experimented with the effects of the novelty information acquired from the temporal ordering of events (assuming that a sentence contains one or more events) in the domain of update multi-document summarization. Our update MDS system has outperformed the state-of-the-art update MDS system in terms of ROUGE-2 and ROUGE-SU4 recall measures. All our MDS systems also generate quality summaries which are manually evaluated based on popular evaluation criteria.

[1]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[2]  Dan Roth,et al.  Joint Inference for Event Timeline Construction , 2012, EMNLP.

[3]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[4]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[5]  Enrique Alfonseca,et al.  DualSum: a Topic-Model based approach for update summarization , 2012, EACL.

[6]  Xun Wang,et al.  Update Summarization using a Multi-level Hierarchical Dirichlet Process Model , 2012, COLING.

[7]  Yixin Chen,et al.  Ranking on Data Manifold with Sink Points , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Dianne P. O'Leary,et al.  CLASSY 2009: Summarization and Metrics , 2009, TAC.

[9]  Robert J. Gaizauskas,et al.  Using Semantic Inferences for Temporal Annotation Comparison , 2005, The Language of Time - A Reader.

[10]  Steven Bethard,et al.  ClearTK-TimeML: A minimalist approach to TempEval 2013 , 2013, *SEMEVAL.

[11]  Pascal Denis,et al.  Predicting Globally-Coherent Temporal Structures from Texts via Endpoint Inference and Graph Decomposition , 2011, IJCAI.

[12]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[13]  David Dubin,et al.  The Most Influential Paper Gerard Salton Never Wrote , 2004, Libr. Trends.

[14]  Furu Wei,et al.  PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization , 2008, COLING.

[15]  Qin Lu,et al.  Sentence Ordering with Event-Enriched Semantics and Two-Layered Clustering for Multi-Document News Summarization , 2010, COLING.

[16]  Michael Halliday,et al.  Cohesion in English , 1976 .

[17]  Wei Heng,et al.  CIST System Report for ACL MultiLing 2013 – Track 1: Multilingual Multi-document Summarization , 2013 .

[18]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[19]  Xuan Li,et al.  Graph-Based Marginal Ranking for Update Summarization , 2011, SDM.

[20]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[21]  James Pustejovsky,et al.  SemEval-2013 Task 1: TempEval-3: Evaluating Time Expressions, Events, and Temporal Relations , 2013, *SEMEVAL.

[22]  Min-Yen Kan,et al.  NUS at TAC 2008: Augumenting Timestamped Graphs with Event Information and Selectively Expanding Opinion Contexts , 2008, TAC.

[23]  Wei Li,et al.  Mixtures of hierarchical topics with Pachinko allocation , 2007, ICML '07.

[24]  James Pustejovsky,et al.  The Specification Language TimeML , 2005, The Language of Time - A Reader.

[25]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[26]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[27]  Florian Boudin,et al.  A Scalable MMR Approach to Sentence Scoring for Multi-Document Update Summarization , 2008, COLING.

[28]  Yinglin Wang,et al.  Generating Aspect-oriented Multi-Document Summarization with Event-aspect model , 2011, EMNLP.

[29]  Min-Yen Kan,et al.  Improved Temporal Relation Classification using Dependency Parses and Selective Crowdsourced Annotations , 2012, COLING.

[30]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[31]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[32]  Jian Su,et al.  Exploiting Discourse Analysis for Article-Wide Temporal Classification , 2013, EMNLP.

[33]  Nathanael Chambers,et al.  Jointly Combining Implicit Constraints Improves Temporal Ordering , 2008, EMNLP.

[34]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[35]  James Pustejovsky,et al.  Evita: A Robust Event Recognizer For QA Systems , 2005, HLT.

[36]  Qin Lu,et al.  Extractive Summarization using Inter- and Intra- Event Relevance , 2006, ACL.

[37]  Inderjeet Mani,et al.  Inferring Temporal Ordering of Events in News , 2003, NAACL.

[38]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[39]  Nicola Stokes,et al.  Applications of Lexical Cohesion Analysis in the Topic Detection and Tracking Domain , 2004 .

[40]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[41]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[42]  R. D. Fierro,et al.  Low-Rank Orthogonal Decompositions for Information Retrieval Applications , 1995 .

[43]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[44]  James H. Martin,et al.  Identification of Event Mentions and their Semantic Class , 2006, EMNLP.

[45]  Brian Roark,et al.  Query-focused Supervised Sentence Ranking for Update Summaries , 2008, TAC.

[46]  Maheedhar Kolla,et al.  Automatic text summarization using lexical chains : algorithms and experiments , 2004 .

[47]  Claire Cardie,et al.  A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization , 2013, ACL.

[48]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[49]  Karel Jezek,et al.  Update summarization based on novel topic distribution , 2009, DocEng '09.

[50]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[51]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[52]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53]  Yuji Matsumoto,et al.  Jointly Identifying Temporal Relations with Markov Logic , 2009, ACL.

[54]  Xuan Li,et al.  Update Summarization via Graph-Based Sentence Ranking , 2013, IEEE Transactions on Knowledge and Data Engineering.

[55]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[56]  Regina Barzilay,et al.  Inducing Temporal Graphs , 2006, EMNLP.

[57]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[58]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[59]  M. Halliday Language as social semiotic: The social interpretation of language and meaning , 1976 .

[60]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[61]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[62]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[63]  Jin Zhang,et al.  Manifold ranking with sink points for update summarization , 2010, CIKM '10.

[64]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[65]  Rada Mihalcea,et al.  eXtended WordNet: progress report , 2001, HTL 2001.

[66]  Dilek Z. Hakkani-Tür,et al.  The ICSI/UTD Summarization System at TAC 2009 , 2009, TAC.

[67]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[68]  Balaraman Ravindran,et al.  Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[69]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[70]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.