From Extracting to Abstracting: Generating Quasi-abstractive Summaries

In this paper, we investigate quasi-abstractive summaries, a new type of machine-generated summaries that do not use whole sentences, but only fragments from the source. Quasi-abstractive summaries aim at bridging the gap between human-written abstracts and extractive summaries. We present an approach that learns how to identify sets of sentences, where each set contains fragments that can be used to produce one sentence in the abstract; and then uses these sets to produce the abstract itself. Our experiments show very promising results. Importantly, we obtain our best results when the summary generation is anchored by the most salient Noun Phrases predicted from the text to be summarized.

[1]  Daniel Marcu,et al.  The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.

[2]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[3]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[4]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5]  SrihariRohini,et al.  Feature selection for text categorization on imbalanced data , 2004 .

[6]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[7]  Barbara Di Eugenio,et al.  Adaptive Learning in Machine Summarization , 2006, FLAIRS Conference.

[8]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[9]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[10]  Christian Borgelt A Naive Bayes Classifier Plug-In for DataEngine , 2004 .

[11]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[12]  Daniel Marcu,et al.  Towards Developing Generation Algorithms for Text-to-Text Applications , 2005, ACL.

[13]  Daniel Marcu,et al.  Stochastic Language Generation Using WIDL-Expressions and its Application in Machine Translation and Summarization , 2006, ACL.

[14]  Stephen Wan,et al.  Global revision in summarisation : generating novel sentences with Prim's algorithm , 2007 .

[15]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[16]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[17]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[18]  Christian M. I. M. Matthiessen,et al.  Text Generation and Systemic-Functional Linguistics: Experiences from English and Japanese , 1992 .

[19]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[20]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[21]  Zhuli Xie Centrality Measures in Text Mining: Prediction of Noun Phrases that Appear in Abstracts , 2005, ACL.

[22]  Michele Banko,et al.  Headline Generation Based on Statistical Translation , 2000, ACL.

[23]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[24]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[25]  Salvatore Ruggieri,et al.  YaDT: yet another decision tree builder , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[26]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[27]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[28]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[29]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .