Personalized search result diversification via structured learning

This paper is concerned with the problem of personalized diversification of search results, with the goal of enhancing the performance of both plain diversification and plain personalization algorithms. In previous work, the problem has mainly been tackled by means of unsupervised learning. To further enhance the performance, we propose a supervised learning strategy. Specifically, we set up a structured learning framework for conducting supervised personalized diversification, in which we add features extracted directly from the tokens of documents and those utilized by unsupervised personalized diversification algorithms, and, importantly, those generated from our proposed user-interest latent Dirichlet topic model. Based on our proposed topic model whether a document can cater to a user's interest can be estimated in our learning strategy. We also define two constraints in our structured learning framework to ensure that search results are both diversified and consistent with a user's interest. We conduct experiments on an open personalized diversification dataset and find that our supervised learning strategy outperforms unsupervised personalized diversification methods as well as other plain personalization and plain diversification methods.

[1]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[2]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[3]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[4]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[5]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[6]  William W. Cohen,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[9]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[10]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[11]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[12]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[13]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[14]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[15]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[16]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[17]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[18]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[19]  Joemon M. Jose,et al.  Personalizing Web Search with Folksonomy-Based User and Document Profiles , 2010, ECIR.

[20]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[21]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Web Track , 2012, TREC.

[22]  Pablo Castells,et al.  Personalized diversification of search results , 2012, SIGIR '12.

[23]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[24]  Jun Wang,et al.  Adaptive diversification of recommendation results via latent factor portfolio , 2012, SIGIR '12.

[25]  Saul Vargas,et al.  Explicit relevance models in intent-oriented information retrieval diversification , 2012, SIGIR '12.

[26]  Nicholas J. Belkin,et al.  Personalization of search results using interaction behaviors in search sessions , 2012, SIGIR '12.

[27]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[28]  Wei Chu,et al.  Personalized ranking model adaptation for web search , 2013, SIGIR.

[29]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Qiaozhu Mei,et al.  One theme in all views: modeling consensus topics in multiple contexts , 2013, KDD.

[31]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[32]  Bo Zhang,et al.  Scalable inference in max-margin topic models , 2013, KDD.

[33]  Wai Lam,et al.  An unsupervised topic segmentation model incorporating word order , 2013, SIGIR.

[34]  Padhraic Smyth,et al.  Text-based measures of document diversity , 2013, KDD.

[35]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[36]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[37]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.