Non-decreasing Sub-modular Function for Comprehensible Summarization

Extractive summarization techniques typically aim to maximize the information coverage of the summary with respect to the original corpus and report accuracies in ROUGE scores. Automated text summarization techniques should consider the dimensions of comprehensibility, coherence and readability. In the current work, we identify the discourse structure which provides the context for the creation of a sentence. We leverage the information from the structure to frame a monotone (non-decreasing) sub-modular scoring function for generating comprehensible summaries. Our approach improves the overall quality of comprehensibility of the summary in terms of human evaluation and gives sufficient content coverage with comparable ROUGE score. We also formulate a metric to measure summary comprehensibility in terms of Contextual Independence of a sentence. The metric is shown to be representative of human judgement of text comprehensibility.

[1]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[2]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[3]  Peifeng Li,et al.  Using Context Inference to Improve Sentence Ordering for Multi-document Summarization , 2011, IJCNLP.

[4]  Chris Mellish,et al.  Evaluating Centering-Based Metrics of Coherence , 2004, ACL.

[5]  Dragomir R. Radev,et al.  Multi Document Centroid-based Text Summarization , 2002, ACL 2002.

[6]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[7]  Kam-Fai Wong,et al.  Extractive Summarization Using Supervised and Semi-Supervised Learning , 2008, COLING.

[8]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.

[9]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[10]  Ahmet Aker,et al.  Multi-document summarization using A * search and discriminative training , 2013 .

[11]  Dilek Z. Hakkani-Tür,et al.  A Hybrid Hierarchical Model for Multi-Document Summarization , 2010, ACL.

[12]  R. Ratcliff,et al.  Inference during reading. , 1992, Psychological review.

[13]  Yinglin Wang,et al.  Generating Aspect-oriented Multi-Document Summarization with Event-aspect model , 2011, EMNLP.

[14]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Ion Androutsopoulos,et al.  Extractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression , 2012, COLING.

[17]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[18]  Oren Etzioni,et al.  Towards Coherent Multi-Document Summarization , 2013, NAACL.

[19]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[20]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[21]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[22]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[23]  Ernst Althaus,et al.  Computing Locally Coherent Discourses , 2004, ACL.

[24]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[25]  Ahmet Aker,et al.  Multi-Document Summarization Using A* Search and Discriminative Learning , 2010, EMNLP.

[26]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.