A User Study on Snippet Generation: Text Reuse vs. Paraphrases

The snippets in the result list of a web search engine are built with sentences from the retrieved web pages that match the query. Reusing a web page's text for snippets has been considered fair use under the copyright laws of most jurisdictions. As of recent, notable exceptions from this arrangement include Germany and Spain, where news publishers are entitled to raise claims under a so-called ancillary copyright. A similar legislation is currently discussed at the European Commission. If this development gains momentum, the reuse of text for snippets will soon incur costs, which in turn will give rise to new solutions for generating truly original snippets. A key question in this regard is whether the users will accept any new approach for snippet generation, or whether they will prefer the current model of "reuse snippets." The paper in hand gives a first answer. A crowdsourcing experiment along with a statistical analysis reveals that our test users exert no significant preference for either kind of snippet. Notwithstanding the technological difficulty, this result opens the door to a new snippet synthesis paradigm.

[1]  Thorsten Joachims,et al.  Eye-tracking analysis of user behavior in WWW search , 2004, SIGIR '04.

[2]  Pavel Braslavski,et al.  Search Snippet Evaluation at Yandex: Lessons Learned and Future Directions , 2011, CLEF.

[3]  Falk Scholer,et al.  Constructing query-biased summaries: a comparison of human and system generated snippets , 2010, IIiX.

[4]  Vishal Gupta,et al.  Recent automatic text summarization techniques: a survey , 2016, Artificial Intelligence Review.

[5]  Ryen W. White,et al.  Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.

[6]  Marti A. Hearst,et al.  Improving Search Results Quality by Customizing Summary Lengths , 2008, ACL.

[7]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[8]  Jan O. Pedersen,et al.  Snippet Search: a Single Phrase Approach to Text Access , 1991 .

[9]  Ziyang Liu,et al.  Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[10]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[11]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[12]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[13]  Hugh E. Williams,et al.  Fast generation of result snippets in web search , 2007, SIGIR.

[14]  Tamás D. Gedeon,et al.  What Snippet Size is Needed in Mobile Web Search? , 2017, CHIIR.

[15]  Brent J. Hecht,et al.  The Substantial Interdependence of Wikipedia and Google: A Case Study on the Relationship Between Peer Production Communities and Information Technologies , 2017, ICWSM.

[16]  Edward Cutrell,et al.  What are you looking for?: an eye-tracking study of information usage in web search , 2007, CHI.

[17]  Michalis Vazirgiannis,et al.  Automated snippet generation for online advertising , 2013, CIKM.

[18]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[19]  Mark Sanderson,et al.  Advantages of query biased summaries in information retrieval , 1998, SIGIR '98.

[20]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[23]  Haofen Wang,et al.  Snippet Generation for Semantic Web Search Engines , 2008, ASWC.

[24]  Ryen W. White,et al.  The Use of Implicit Evidence for Relevance Feedback in Web Retrieval , 2002, ECIR.

[25]  David Maxwell,et al.  A Study of Snippet Length and Informativeness: Behaviour, Performance and User Experience , 2017, SIGIR.

[26]  Matthias Hagen,et al.  A Plan for Ancillary Copyright: Original Snippets , 2018, NewsIR@ECIR.