Profile Diversity for Query Processing using User Recommendations

More than 90% of the queries submitted to content sharing platforms, such as Flickr, are vague, i.e.only contain a few keywords, thus complicating the task of effectively returning interesting results. To overcome this limitation, many platforms use recommendation strategies to filter the results. But, recommendations usually tend to return highly redundant items. Content diversification has been studied as a solution to overcome this problem. However, it may suffer from at least two limitations: poor content description and semantic ambiguity.In this paper, we investigate profile diversity for searching web items. Profile diversification enables to address the problem of returning redundant items, and enhances the quality of diversification. We propose a threshold-based approach to return the most relevant and most popular documents while satisfying content and profile diversity constraints. Our approach includes a family of techniques allowing to efficiently retrieve the desired documents. To evaluate our solution, we have run intensive experiments, including a user survey, on three datasets; in more than 75% of the cases, profile diversity is similar or preferred by the users compared to other approaches. Additionally our optimization techniques enable to reduce the response time up to 12 times compared to a baseline greedy diversification algorithm. HighlightsWe propose a specific scoring function for content and profile diversification using a probabilistic model.We propose a greedy threshold-based top-k algorithm to process queries using our profile diversity score using the concept of candidate list.We propose various techniques for optimizing the computation of top-k diversified profiles.To evaluate the benefits of our scoring function and optimization techniques, we ran our algorithms using three datasets: two from Del.icio.us and one from Flickr. The results show that our approach increases the overall quality of recommendations and that our optimizing strategies reduce significantly the response time of the diversified top-k computation

[1]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[2]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[3]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[4]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[5]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[6]  Sihem Amer-Yahia,et al.  Profile diversity in search and recommendation , 2013, WWW.

[7]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.

[8]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[9]  Claudio Moraga,et al.  The Influence of the Sigmoid Function Parameters on the Speed of Backpropagation Learning , 1995, IWANN.

[10]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[11]  Sihem Amer-Yahia,et al.  Real-time recommendation of diverse related articles , 2013, WWW.

[12]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[13]  Cong Yu,et al.  SocialScope: Enabling Information Discovery on Social Content Sites , 2009, CIDR.

[14]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[15]  Sergei Vassilvitskii,et al.  Getting recommender systems to think outside the box , 2009, RecSys '09.

[16]  GoldbergDavid,et al.  Using collaborative filtering to weave an information tapestry , 1992 .

[17]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[18]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[19]  Tao Li,et al.  Addressing diverse user preferences in SQL-query-result navigation , 2007, SIGMOD '07.

[20]  Ricardo A. Baeza-Yates,et al.  New Stochastic Algorithms for Scheduling Ads in Sponsored Search , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[21]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[22]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[23]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[24]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.