论文信息 - Single-Document Summarization as a Tree Knapsack Problem

Single-Document Summarization as a Tree Knapsack Problem

Recent studies on extractive text summarization formulate it as a combinatorial optimization problem such as a Knapsack Problem, a Maximum Coverage Problem or a Budgeted Median Problem. These methods successfully improved summarization quality, but they did not consider the rhetorical relations between the textual units of a source document. Thus, summaries generated by these methods may lack logical coherence. This paper proposes a single document summarization method based on the trimming of a discourse tree. This is a two-fold process. First, we propose rules for transforming a rhetorical structure theorybased discourse tree into a dependency-based discourse tree, which allows us to take a treetrimming approach to summarization. Second, we formulate the problem of trimming a dependency-based discourse tree as a Tree Knapsack Problem, then solve it with integer linear programming (ILP). Evaluation results showed that our method improved ROUGE scores.

[1] M. Rey. Improving summarization through rhetorical parsing tuning , 1998 .

[2] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[3] Michael Strube,et al. Dependency Tree Based Sentence Compression , 2008, INLG.

[4] Daniel Marcu,et al. Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[5] Hiroya Takamura,et al. Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[6] Vasileios Hatzivassiloglou,et al. A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[7] Hiroya Takamura,et al. Text summarization model based on the budgeted median problem , 2009, CIKM.

[8] Joseph A. Lukes. Efficient Algorithm for the Partitioning of Trees , 1974, IBM J. Res. Dev..

[9] Mark T. Maybury,et al. Automatic Summarization , 2002, Computational Linguistics.

[10] Ryan T. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[11] Mitsuru Ishizuka,et al. HILDA: A Discourse Parser Using Support Vector Machine Classification , 2010, Dialogue Discourse.

[12] N. Samphaiboon,et al. Heuristic and Exact Algorithms for the Precedence-Constrained Knapsack Problem , 2000 .

[13] Y. Benjamini,et al. Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[14] William C. Mann,et al. Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[15] Helmut Prendinger,et al. A Novel Discourse Parser Based on Support Vector Machine Classification , 2009, ACL.

[16] Daniel Marcu,et al. A Noisy-Channel Model for Document Compression , 2002, ACL.

[17] Geon Cho,et al. A Depth-First Dynamic Programming Algorithm for the Tree Knapsack Problem , 1997, INFORMS J. Comput..

[18] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.