Sparse Spatial Selection for Novelty-Based Search Result Diversification

Novelty-based diversification approaches aim to produce a diverse ranking by directly comparing the retrieved documents. However, since such approaches are typically greedy, they require O(n2) documentdocument comparisons in order to diversify a ranking of n documents. In this work, we propose to model novelty-based diversification as a similarity search in a sparse metric space. In particular, we exploit the triangle inequality property of metric spaces in order to drastically reduce the number of required document-document comparisons. Thorough experiments using three TREC test collections show that our approach is at least as effective as existing novelty-based diversification approaches, while improving their efficiency by an order of magnitude.

[1]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[2]  M. Mamede,et al.  Range queries in natural language dictionaries with recursive lists of clusters , 2007, 2007 22nd international symposium on computer and information sciences.

[3]  Charles L. A. Clarke,et al.  Overview of the TREC 2010 Web Track , 2010, TREC.

[4]  Jun Wang,et al.  Portfolio theory of information retrieval , 2009, SIGIR.

[5]  Gonzalo Navarro,et al.  Dynamic Spatial Approximation Trees for Massive Data , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[6]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[7]  Luisa Micó,et al.  A fast branch & bound nearest neighbour classifier in metric spaces , 1996, Pattern Recognit. Lett..

[8]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[9]  Nora Reyes,et al.  Similarity Search Using Sparse Pivots for Efficient Multimedia Information Retrieval , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[10]  Charles L. A. Clarke,et al.  Overview of the TREC 2011 Web Track | NIST , 2011 .

[11]  Paul Over,et al.  TREC-8 interactive track , 1999, SIGF.

[12]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[13]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[14]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[15]  Craig MacDonald,et al.  Terrier Information Retrieval Platform , 2005, ECIR.

[16]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[17]  Gonzalo Navarro,et al.  Fully Dynamic Spatial Approximation Trees , 2002, SPIRE.

[18]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[19]  Nick Craswell,et al.  Proceedings of the 2009 workshop on Web Search Click Data, WSCD@WSDM 2009, Barcelona, Spain, February 9, 2009 , 2009, WSCD@WSDM.

[20]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[21]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[22]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[23]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[24]  Benjamin Bustos,et al.  Text-Based and Content-Based Image Retrieval on Flickr: DEMO , 2009, 2009 Second International Workshop on Similarity Search and Applications.

[25]  Paul Over,et al.  TREC-7 Interactive Track Report , 1998, TREC.