Summary Cloze: A New Task for Content Selection in Topic-Focused Summarization

A key challenge in topic-focused summarization is determining what information should be included in the summary, a problem known as content selection. In this work, we propose a new method for studying content selection in topic-focused summarization called the summary cloze task. The goal of the summary cloze task is to generate the next sentence of a summary conditioned on the beginning of the summary, a topic, and a reference document(s). The main challenge is deciding what information in the references is relevant to the topic and partial summary and should be included in the summary. Although the cloze task does not address all aspects of the traditional summarization problem, the more narrow scope of the task allows us to collect a large-scale datset of nearly 500k summary cloze instances from Wikipedia. We report experimental results on this new dataset using various extractive models and a two-step abstractive model that first extractively selects a small number of sentences and then abstractively summarizes them. Our results show that the topic and partial summary help the models identify relevant content, but the task remains a significant challenge.

[1]  Mor Naaman,et al.  Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies , 2018, NAACL.

[2]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[5]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[6]  David Konopnicki,et al.  Unsupervised Query-Focused Multi-Document Summarization using the Cross Entropy Method , 2017, SIGIR.

[7]  Zhi-Hong Deng,et al.  An Unsupervised Multi-Document Summarization Framework Based on Neural Document Model , 2016, COLING.

[8]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[11]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[12]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[13]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[14]  Hoa Trang Dang,et al.  Overview of DUC 2006 , 2006 .

[15]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[17]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[18]  Kathleen R. McKeown,et al.  Understanding the process of multi-document summarization: content selection, rewriting and evaluation , 2006 .

[19]  Lukasz Kaiser,et al.  Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[22]  Prasenjit Mitra,et al.  WikiWrite: Generating Wikipedia Articles Automatically , 2016, IJCAI.

[23]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[24]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[25]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[26]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[27]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[28]  Kathleen McKeown,et al.  Content Selection in Deep Learning Models of Summarization , 2018, EMNLP.

[29]  Ani Nenkova,et al.  Summarization evaluation for text and speech: issues and approaches , 2006, INTERSPEECH.

[30]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[31]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.