Online selection of diverse results

The phenomenal growth in the volume of easily accessible information via various web-based services has made it essential for service providers to provide users with personalized representative summaries of such information. Further, online commercial services including social networking and micro-blogging websites, e-commerce portals, leisure and entertainment websites, etc. recommend interesting content to users that is simultaneously diverse on many different axes such as topic, geographic specificity, etc. The key algorithmic question in all these applications is the generation of a succinct, representative, and relevant summary from a large stream of data coming from a variety of sources. In this paper, we formally model this optimization problem, identify its key structural characteristics, and use these observations to design an extremely scalable and efficient algorithm. We analyze the algorithm using theoretical techniques to show that it always produces a nearly optimal solution. In addition, we perform large-scale experiments on both real-world and synthetically generated datasets, which confirm that our algorithm performs even better than its analytical guarantees in practice, and also outperforms other candidate algorithms for the problem by a wide margin.

[1]  Filip Radlinski,et al.  Redundancy, diversity and interdependent document relevance , 2009, SIGF.

[2]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[3]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[4]  Craig MacDonald,et al.  How diverse are web search results? , 2011, SIGIR '11.

[5]  Evaggelia Pitoura,et al.  Diversity over Continuous Data , 2009, IEEE Data Eng. Bull..

[6]  Nikhil Bansal,et al.  The Santa Claus problem , 2006, STOC '06.

[7]  Nikhil R. Devanur,et al.  Near optimal online algorithms and fast approximation algorithms for resource allocation problems , 2011, EC '11.

[8]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[9]  Craig MacDonald,et al.  Selectively diversifying web search results , 2010, CIKM.

[10]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[11]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[12]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[13]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[16]  John D. Lafferty,et al.  A risk minimization framework for information retrieval , 2006, Inf. Process. Manag..

[17]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[19]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[20]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[21]  Recommendation Diversification Using Explanations , 2009, 2009 IEEE 25th International Conference on Data Engineering.