Extending Multi-Document Summarization Evaluation to the Interactive Setting

Allowing users to interact with multi-document summarizers is a promising direction towards improving and customizing summary results. Different ideas for interactive summarization have been proposed in previous work but these solutions are highly divergent and incomparable. In this paper, we develop an end-to-end evaluation framework for interactive summarization, focusing on expansion-based interaction, which considers the accumulating information along a user session. Our framework includes a procedure of collecting real user sessions, as well as evaluation measures relying on summarization standards, but adapted to reflect interaction. All of our solutions and resources are available publicly as a benchmark, allowing comparison of future developments in interactive summarization, and spurring progress in its methodological evaluation. We demonstrate the use of our framework by evaluating and comparing baseline implementations that we developed for this purpose, which will serve as part of our benchmark. Our extensive experimentation and analysis motivate the proposed evaluation framework design and support its viability.

[1]  Giovanni Semeraro,et al.  Centroid-based Text Summarization through Compositionality of Word Embeddings , 2017, MultiLing@EACL.

[2]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[3]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[4]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[5]  Ido Dagan,et al.  Crowdsourcing Lightweight Pyramids for Manual Summary Evaluation , 2019, NAACL.

[6]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[7]  Ido Dagan,et al.  Evaluating Multiple System Summary Lengths: A Case Study , 2018, EMNLP.

[8]  Jimmy J. Lin Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series , 2007, HLT-NAACL.

[9]  Emine Yilmaz,et al.  Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems , 2012, Information Retrieval.

[10]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[11]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[12]  Wenpeng Yin,et al.  Optimizing Sentence Modeling and Selection for Document Summarization , 2015, IJCAI.

[13]  Michael Elhadad,et al.  Query-Chain Focused Summarization , 2014, ACL.

[14]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[15]  Craig MacDonald,et al.  Incremental Update Summarization: Adaptive Sentence Selection based on Prevalence and Novelty , 2014, CIKM.

[16]  Xuanjing Huang,et al.  Using query expansion in graph-based approach for query-focused multi-document summarization , 2009, Inf. Process. Manag..

[17]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[18]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[19]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[20]  Furu Wei,et al.  AttSum: Joint Learning of Focusing and Summarization with Neural Attention , 2016, COLING.

[21]  Nadine Rauh,et al.  System Latency Guidelines Then and Now - Is Zero Latency Really Considered Necessary? , 2017, HCI.

[22]  Anton Leuski,et al.  iNeATS: Interactive Multi-Document Summarization , 2003, ACL.

[23]  Ido Dagan,et al.  Controlled Crowdsourcing for High-Quality QA-SRL Annotation , 2019, ACL.

[24]  Michael Elhadad,et al.  Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models , 2018, ArXiv.

[25]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[26]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  Markus Zopf,et al.  Estimating Summary Quality with Pairwise Preferences , 2018, NAACL.

[29]  Chirag Shah,et al.  Evaluating user search trails in exploratory search tasks , 2017, Inf. Process. Manag..

[30]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[31]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[32]  Jimmy J. Lin,et al.  Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization , 2010, NAACL.

[33]  James R. Lewis,et al.  UMUX-LITE: when there's no time for the SUS , 2013, CHI.

[34]  Jimmy J. Lin,et al.  Overview of the TREC 2006 ciQA task , 2007, SIGF.

[35]  Gary Marchionini,et al.  Editorial: Evaluating exploratory search systems , 2008 .

[36]  Pia Borlund,et al.  The IIR evaluation model: a framework for evaluation of interactive information retrieval systems , 2003, Inf. Res..

[37]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[38]  Mausam,et al.  Hierarchical Summarization: Scaling Up Multi-Document Summarization , 2014, ACL.

[39]  Jian-Yun Nie,et al.  Summarize What You Are Interested In: An Optimization Framework for Interactive Personalized Summarization , 2011, EMNLP.

[40]  Johannes Fürnkranz,et al.  Sequential Clustering and Contextual Importance Measures for Incremental Update Summarization , 2016, COLING.

[41]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[42]  Kai Hong,et al.  Improving the Estimation of Word Importance for News Multi-Document Summarization , 2014, EACL.

[43]  Gerhard Weikum,et al.  Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion , 2019, CIKM.

[44]  Hoa Trang Dang,et al.  Overview of DUC 2006 , 2006 .

[45]  David Konopnicki,et al.  Unsupervised Query-Focused Multi-Document Summarization using the Cross Entropy Method , 2017, SIGIR.

[46]  Jimmy J. Lin,et al.  Will Pyramids Built of Nuggets Topple Over? , 2006, NAACL.

[47]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[48]  Furu Wei,et al.  PNR2: Ranking Sentences with Positive and Negative Reinforcement for Query-Oriented Update Summarization , 2008, COLING.

[49]  Jade Goldstein-Stewart,et al.  Creating and evaluating multi-document sentence extract summaries , 2000, CIKM '00.

[50]  Ido Dagan,et al.  Interactive Abstractive Summarization for Event News Tweets , 2017, EMNLP.

[51]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[52]  Abram Handler,et al.  Rookie: A unique approach for exploring news archives , 2017, ArXiv.

[53]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.