Extractive Summarization via Overlap-Based Optimized Picking

Optimization-based methods regard summarization as a combinatorial optimization problem and formulate it as weighted linear combination of criteria metrics. However due to inconsistent criteria metrics, it is hard to set proper weights. Subjectivity problem also arises since most of them summarize original texts. In this paper, we propose overlap based greedy picking (OGP) algorithm for citation-based extractive summarization. In the algorithm, overlap is defined as a sentence containing several topics. Since including overlaps into summaires indirectly impacts on salience, summary size and content redundancy, OGP effectively avoids the problem of inconsistent metric while dynamically involving criteria into optimization. Despite of greedy method, OGP proves above \((1-1/e)\) of optimal solution. Since citation context is composed of objective evaluations, OGP also solves subjectivity problem. Our experiment results show that OGP outperforms other baseline methods. And various criteria proves effectively involved under the control of single parameter \(\beta \).

[1]  Manabu Okumura,et al.  Towards Multi-paper Summarization Using Reference Information , 1999, IJCAI.

[2]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[3]  Tsutomu Hirao,et al.  Text Summarization Model based on Redundancy-Constrained Knapsack Problem , 2012, COLING.

[4]  Dragomir R. Radev,et al.  DivRank: the interplay of prestige and diversity in information networks , 2010, KDD.

[5]  Andrew Skabar,et al.  Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[7]  Masaaki Nagata,et al.  Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[8]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[9]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[10]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[11]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[12]  Dragomir R. Radev,et al.  Using Citations to Generate surveys of Scientific Paradigms , 2009, NAACL.

[13]  Vasudeva Varma,et al.  Non-decreasing Sub-modular Function for Comprehensible Summarization , 2016, NAACL.

[14]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[15]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[16]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[17]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[18]  M. Markus,et al.  Fluctuation theorem for a deterministic one-particle system. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[19]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[20]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Daraksha Parveen,et al.  Generating Coherent Summaries of Scientific Articles Using Coherence Patterns , 2016, EMNLP.

[22]  Pascale Fung,et al.  Combining Optimal Clustering and Hidden Markov Models for Extractive Summarization , 2003, ACL 2003.

[23]  Tomek Strzalkowski,et al.  Cross-document summarization by concept classification , 2002, SIGIR '02.

[24]  Daraksha Parveen,et al.  Topical Coherence for Graph-based Extractive Summarization , 2015, EMNLP.

[25]  Ani Nenkova,et al.  Syntactic Simplification for Improving Content Selection in Multi-Document Summarization , 2004, COLING.

[26]  Hiroya Takamura,et al.  Subtree Extractive Summarization via Submodular Maximization , 2013, ACL.

[27]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[28]  ChengXiang Zhai,et al.  Generating Impact-Based Summaries for Scientific Literature , 2008, ACL.

[29]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[30]  Dain Kaplan,et al.  Automatic Extraction of Citation Contexts for Research Paper Summarization: A Coreference-chain based Approach , 2009 .

[31]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[32]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[33]  Yang Liu,et al.  Fast Joint Compression and Summarization via Graph Cuts , 2013, EMNLP.

[34]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[35]  Dragomir R. Radev,et al.  Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. , 2010, ACL.

[36]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[37]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[38]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .