GraphSum: Discovering correlations among multiple terms for graph-based summarization

Abstract Graph-based summarization entails extracting a worthwhile subset of sentences from a collection of textual documents by using a graph-based model to represent the correlations between pairs of document terms. However, since the high-order correlations among multiple terms are disregarded during graph evaluation, the summarization performance could be limited unless integrating ad hoc language-dependent or semantics-based analysis. This paper presents a novel and general-purpose graph-based summarizer, namely G raph S um (Graph-based Summarizer). It discovers and exploits association rules to represent the correlations among multiple terms that have been neglected by previous approaches. The graph nodes, which represent combinations of two or more terms, are first ranked by means of a PageRank strategy that discriminates between positive and negative term correlations. Then, the produced node ranking is used to drive the sentence selection process. The experiments performed on benchmark and real-life documents demonstrate the effectiveness of the proposed approach compared to many state-of-the-art summarizers.

[1]  Jerry M. Mendel,et al.  Charles Ragin's Fuzzy Set Qualitative Comparative Analysis (fsQCA) used for linguistic summarizations , 2012, Inf. Sci..

[2]  Dafna Shahaf,et al.  Connecting Two (or Less) Dots: Discovering Structure in News Articles , 2012, TKDD.

[3]  Engelbert Mephu Nguifo,et al.  Looking for a structural characterization of the sparseness measure of (frequent closed) itemset contexts , 2013, Inf. Sci..

[4]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[5]  Hiroya Takamura,et al.  Text summarization model based on the budgeted median problem , 2009, CIKM.

[6]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[7]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[8]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[9]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[10]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[11]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[14]  Anthony K. H. Tung,et al.  FARMER: finding interesting rule groups in microarray datasets , 2004, SIGMOD '04.

[15]  Giuseppe Carenini,et al.  Summarizing email conversations with clue words , 2007, WWW '07.

[16]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[17]  Chun Chen,et al.  Tag-oriented document summarization , 2009, WWW '09.

[18]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[19]  Fernando Pereira,et al.  Generating summary keywords for emails using topics , 2008, IUI '08.

[20]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[21]  Josef Steinberger,et al.  JRC's Participation at TAC 2011: Guided and MultiLingual Summarization Tasks , 2011, TAC.

[22]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[23]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[24]  Luca Cagliero,et al.  Multi-document summarization exploiting frequent itemsets , 2012, SAC '12.

[25]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[26]  John M. Conroy Left-Brain/Right-Brain Multi-Document Summarization , 2004 .

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[30]  Jihoon Yang,et al.  Extracting sentence segments for text summarization: a machine learning approach , 2000, SIGIR '00.

[31]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[32]  Thiago Alexandre Salgueiro Pardo,et al.  Graph-Based Methods for Multi-document Summarization: Exploring Relationship Maps, Complex Networks and Discourse Information , 2012, PROPOR.

[33]  Andrea Tagarelli,et al.  Exploring dictionary-based semantic relatedness in labeled tree data , 2013, Inf. Sci..

[34]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[35]  Kweku-Muata Osei-Bryson,et al.  Using ontologies to facilitate post-processing of association rules by domain experts , 2011, Inf. Sci..

[36]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[37]  Luciano da Fontoura Costa,et al.  Extractive summarization using complex networks and syntactic dependency , 2012 .

[38]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[39]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[40]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[41]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[42]  M. B. Chandak,et al.  Graph-Based Algorithms for Text Summarization , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[43]  Ted K. Ralphs,et al.  The Symphony Callable Library for Mixed Integer Programming , 2005 .