Efficient Structured Learning for Personalized Diversification

This paper is concerned with the problem of personalized diversification of search results, with the goal of enhancing the performance of both plain diversification and plain personalization algorithms. In previous work, the problem has mainly been tackled by means of unsupervised learning. To further enhance the performance, we propose a supervised learning strategy. Specifically, we set up a structured learning framework for conducting supervised personalized diversification, in which we add features extracted directly from tokens of documents and those utilized by unsupervised personalized diversification algorithms, and, importantly, those generated from our proposed user-interest latent Dirichlet topic model. We also define two constraints in our structured learning framework to ensure that search results are both diversified and consistent with a user's interest. To further boost the efficiency of training, we propose a fast training framework for our proposed method by adding additional multiple highly violated but also diversified constraints at every training iteration of the cutting-plane algorithm. We conduct experiments on an open dataset and find that our supervised learning strategy outperforms unsupervised personalized diversification methods as well as other plain personalization and plain diversification methods. Our fast training framework significantly saves training time while it maintains almost the same performance.

[1]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[2]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[3]  Thorsten Joachims,et al.  Online Structured Prediction via Coactive Learning , 2012, ICML.

[4]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[5]  Jun Wang,et al.  Adaptive diversification of recommendation results via latent factor portfolio , 2012, SIGIR '12.

[6]  Wai Lam,et al.  An unsupervised topic segmentation model incorporating word order , 2013, SIGIR.

[7]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[8]  Wei Chu,et al.  Personalized ranking model adaptation for web search , 2013, SIGIR.

[9]  Qiaozhu Mei,et al.  One theme in all views: modeling consensus topics in multiple contexts , 2013, KDD.

[10]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[11]  Padhraic Smyth,et al.  Text-based measures of document diversity , 2013, KDD.

[12]  Evangelos Kanoulas,et al.  Dynamic Clustering of Streaming Short Documents , 2016, KDD.

[13]  Steve Branson,et al.  Efficient Large-Scale Structured Learning , 2013, CVPR.

[14]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[15]  Ji-Rong Wen,et al.  WWW 2007 / Track: Search Session: Personalization A Largescale Evaluation and Analysis of Personalized Search Strategies ABSTRACT , 2022 .

[16]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[17]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[18]  M. de Rijke,et al.  Summarizing Contrastive Themes via Hierarchical Non-Parametric Processes , 2015, SIGIR.

[19]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[22]  Pablo Castells,et al.  Personalized diversification of search results , 2012, SIGIR '12.

[23]  Nadia Magnenat-Thalmann,et al.  Who, where, when and what: discover spatio-temporal topics for twitter users , 2013, KDD.

[24]  Saul Vargas,et al.  Explicit relevance models in intent-oriented information retrieval diversification , 2012, SIGIR '12.

[25]  M. de Rijke,et al.  Explainable User Clustering in Short Text Streams , 2016, SIGIR.

[26]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[27]  Ellen M. Voorhees,et al.  TREC 2014 Web Track Overview , 2015, TREC.

[28]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[29]  M. de Rijke,et al.  Click Models for Web Search , 2015, Click Models for Web Search.

[30]  Nicholas J. Belkin,et al.  Personalization of search results using interaction behaviors in search sessions , 2012, SIGIR '12.

[31]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[32]  John D. Lafferty,et al.  Beyond independent relevance: methods and evaluation metrics for subtopic retrieval , 2003, SIGIR.

[33]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[34]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Jimmy J. Lin,et al.  Overview of the TREC-2014 Microblog Track , 2014, TREC.

[37]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[38]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[39]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[40]  Wei Chu,et al.  Modeling the impact of short- and long-term behavior on search personalization , 2012, SIGIR '12.

[41]  Pushmeet Kohli,et al.  DivMCuts: Faster Training of Structural SVMs with Diverse M-Best Cutting-Planes , 2013, AISTATS.

[42]  Joemon M. Jose,et al.  Personalizing Web Search with Folksonomy-Based User and Document Profiles , 2010, ECIR.

[43]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.

[44]  M. de Rijke,et al.  Formal language models for finding groups of experts , 2016, Inf. Process. Manag..

[45]  M. de Rijke,et al.  Personalized search result diversification via structured learning , 2014, KDD.

[46]  Stephen E. Robertson,et al.  Simple Evaluation Metrics for Diversified Search Results , 2010, EVIA@NTCIR.

[47]  Mark J. F. Gales,et al.  Structured SVMs for Automatic Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Bo Zhang,et al.  Scalable inference in max-margin topic models , 2013, KDD.

[49]  Ji-Rong Wen,et al.  Incorporating Social Role Theory into Topic Models for Social Media Content Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[50]  M. de Rijke,et al.  Burst-aware data fusion for microblog search , 2015, Inf. Process. Manag..

[51]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[52]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[53]  W. Bruce Croft,et al.  Term level search result diversification , 2013, SIGIR.

[54]  Xiaojie Yuan,et al.  Evaluating the Effectiveness of Personalized Web Search , 2009, IEEE Transactions on Knowledge and Data Engineering.