A survey of query result diversification

Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users’ satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.

[1]  Wenfei Fan,et al.  On the Complexity of Query Result Diversification , 2013, Proc. VLDB Endow..

[2]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[3]  Hakan Ferhatosmanoglu,et al.  Diversity based Relevance Feedback for Time Series Search , 2013, Proc. VLDB Endow..

[4]  Alexandros Labrinidis,et al.  Exploring the tradeoff between performance and data freshness in database-driven Web servers , 2004, The VLDB Journal.

[5]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Evaggelia Pitoura,et al.  POIKILO: A Tool for Evaluating the Results of Diversification Models and Algorithms , 2013, Proc. VLDB Endow..

[7]  Mohamed A. Sharaf,et al.  Progressive diversification for column-based data exploration platforms , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[8]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[9]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[10]  Jayant R. Haritsa,et al.  Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[11]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[12]  Sreenivas Gollapudi,et al.  Exploiting asymmetry in hierarchical topic extraction , 2006, CIKM '06.

[13]  Hong Cheng,et al.  Top-K structural diversity search in large networks , 2013, The VLDB Journal.

[14]  John Riedl,et al.  An Algorithmic Framework for Performing Collaborative Filtering , 1999, SIGIR Forum.

[15]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[16]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[17]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[18]  Verónika Peralta,et al.  A framework for analysis of data freshness , 2004, IQIS '04.

[19]  ChengXiang Zhai,et al.  Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[20]  Chinya V. Ravishankar,et al.  Pointwise-Dense Region Queries in Spatio-temporal Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Lihui Chen,et al.  Novelty detection for text documents using named entity recognition , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[22]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[23]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[24]  Gao Cong,et al.  Diversity-Aware Top-k Publish/Subscribe for Text Stream , 2015, SIGMOD Conference.

[25]  Xin Wang,et al.  Diversified Top-k Graph Pattern Matching , 2013, Proc. VLDB Endow..

[26]  Panos K. Chrysanthis,et al.  Performance vs. freshness in web database applications , 2013, World Wide Web.

[27]  Davide Martinenghi,et al.  Top-k diversity queries over bounded regions , 2013, TODS.

[28]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[29]  Hans-Peter Kriegel,et al.  "Strength Lies in Differences": Diversifying Friends for Recommendations through Subspace Clustering , 2014, CIKM.

[30]  Evaggelia Pitoura,et al.  PerK: personalized keyword search in relational databases through preferences , 2010, EDBT '10.

[31]  Markus Schedl,et al.  Tailoring Music Recommendations to Users by Considering Diversity, Mainstreaminess, and Novelty , 2015, SIGIR.

[32]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[33]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[34]  Wolfgang Nejdl,et al.  Incremental diversification for very large sets: a streaming-based approach , 2011, SIGIR '11.

[35]  Hong Cheng,et al.  Graph classification: a diversified discriminative feature selection approach , 2012, CIKM.

[36]  Evaggelia Pitoura,et al.  Preference-aware publish/subscribe delivery with diversity , 2009, DEBS '09.

[37]  Christian S. Jensen,et al.  Spatial Keyword Querying , 2012, ER.

[38]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[39]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[40]  Flora S. Tsai,et al.  Database optimization for novelty mining of business blogs , 2011, Expert Syst. Appl..

[41]  Jingrui He,et al.  Diversified ranking on large graphs: an optimization viewpoint , 2011, KDD.

[42]  Jiayu Tang,et al.  Evaluation and User Preference Study on Spatial Diversity , 2010, ECIR.

[43]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[44]  Alexandros Labrinidis,et al.  Balancing Performance and Data Freshness in Web Database Servers , 2003, VLDB.

[45]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[46]  Evaggelia Pitoura,et al.  Dynamic diversification of continuous data , 2012, EDBT '12.

[47]  Tanmoy Chakraborty,et al.  DiSCern: A diversified citation recommendation system for scientific queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[48]  Evaggelia Pitoura,et al.  Diverse Set Selection Over Dynamic Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[49]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[50]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[51]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[52]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[53]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[54]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[55]  Jeffrey Xu Yu,et al.  Scalable Diversified Ranking on Large Graphs , 2011, IEEE Transactions on Knowledge and Data Engineering.

[56]  Jayant R. Haritsa The KNDN Problem: A Quest for Unity in Diversity , 2009, IEEE Data Eng. Bull..

[57]  Evaggelia Pitoura,et al.  Diversity over Continuous Data , 2009, IEEE Data Eng. Bull..

[58]  Lu Li,et al.  Efficient Indexing for Diverse Query Results , 2013, Proc. VLDB Endow..

[59]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.

[60]  Ismail Sengör Altingövde,et al.  Query Performance Prediction for Aspect Weighting in Search Result Diversification , 2014, CIKM.

[61]  Agma J. M. Traina,et al.  Parameter-free and domain-independent similarity search with diversity , 2013, SSDBM.

[62]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[63]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[64]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[65]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[66]  Mohamed A. Sharaf,et al.  DoS: an efficient scheme for the diversification of multiple search results , 2013, SSDBM.

[67]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[68]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[69]  Evimaria Terzi,et al.  Highlighting Diverse Concepts in Documents , 2009, SDM.

[70]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[71]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[72]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[73]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[74]  S. Robertson The probability ranking principle in IR , 1997 .

[75]  Ximena Olivares,et al.  Visual diversification of image search results , 2009, WWW '09.

[76]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[77]  Djoerd Hiemstra,et al.  Twenty-One at TREC-8: using Language Technology for Information Retrieval , 1999, TREC.

[78]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[79]  Fuji Ren,et al.  Search Result Diversification via Filling Up Multiple Knapsacks , 2014, CIKM.

[80]  Divesh Srivastava,et al.  DivDB: A System for Diversifying Query Results , 2011, Proc. VLDB Endow..

[81]  Avi Arampatzis,et al.  Multi-Dimensional Scattered Ranking Methods for Geographic Information Retrieval* , 2005, GeoInformatica.

[82]  Davide Martinenghi,et al.  Proximity rank join , 2010, Proc. VLDB Endow..

[83]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[84]  Lijun Chang,et al.  Diversified top-k clique search , 2015, The VLDB Journal.

[85]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[86]  Sang Hyuk Son,et al.  A QoS-sensitive approach for timeliness and freshness guarantees in real-time databases , 2002, Proceedings 14th Euromicro Conference on Real-Time Systems. Euromicro RTS 2002.

[87]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[88]  Fabrizio Silvestri,et al.  Efficient Diversification of Web Search Results , 2011, Proc. VLDB Endow..

[89]  Yiqun Liu,et al.  Overview of the NTCIR-11 IMine Task , 2014, NTCIR.

[90]  David L. Wallace,et al.  A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[91]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[92]  Robert Dale,et al.  Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics , 1999 .

[93]  Flora S. Tsai,et al.  Redundancy and novelty mining in the business blogosphere , 2010 .

[94]  Anthony K. H. Tung,et al.  BROAD: Diversified Keyword Search in Databases , 2011, Proc. VLDB Endow..

[95]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[96]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[97]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.