From Greedy Selection to Exploratory Decision-Making: Diverse Ranking with Policy-Value Networks

The goal of search result diversification is to select a subset of documents from the candidate set to satisfy as many different subtopics as possible. In general, it is a problem of subset selection and selecting an optimal subset of documents is NP-hard. Existing methods usually formalize the problem as ranking the documents with greedy sequential document selection. At each of the ranking position the document that can provide the largest amount of additional information is selected. It is obvious that the greedy selections inevitably produce suboptimal rankings. In this paper we propose to partially alleviate the problem with a Monte Carlo tree search (MCTS) enhanced Markov decision process (MDP), referred to as M$^2$Div. In M$^2$Div, the construction of diverse ranking is formalized as an MDP process where each action corresponds to selecting a document for one ranking position. Given an MDP state which consists of the query, selected documents, and candidates, a recurrent neural network is utilized to produce the policy function for guiding the document selection and the value function for predicting the whole ranking quality. The produced raw policy and value are then strengthened with MCTS through exploring the possible rankings at the subsequent positions, achieving a better search policy for decision-making. Experimental results based on the TREC benchmarks showed that M$^2$Div can significantly outperform the state-of-the-art baselines based on greedy sequential document selection, indicating the effectiveness of the exploratory decision-making mechanism in M$^2$Div.

[1]  Roi Blanco,et al.  A Concise Integer Linear Programming Formulation for Implicit Search Result Diversification , 2017, WSDM.

[2]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[3]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[4]  Thorsten Joachims,et al.  Interactively optimizing information retrieval systems as a dueling bandits problem , 2009, ICML '09.

[5]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[6]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[7]  Ji-Rong Wen,et al.  Supervised Search Result Diversification via Subtopic Attention , 2018, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[9]  Scott Sanner,et al.  Probabilistic latent maximal marginal relevance , 2010, SIGIR '10.

[10]  Ji-Rong Wen,et al.  Learning to Diversify Search Results via Subtopic Attention , 2017, SIGIR.

[11]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[12]  Wei Zeng,et al.  Adapting Markov Decision Process for Search Result Diversification , 2017, SIGIR.

[13]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[14]  Grace Hui Yang,et al.  A POMDP model for content-free document re-ranking , 2014, SIGIR.

[15]  Xueqi Cheng,et al.  Modeling Document Novelty with Neural Tensor Network for Search Result Diversification , 2016, SIGIR.

[16]  Sumit Bhatia Multidimensional search result diversification: diverse search results for diverse users , 2011, SIGIR '11.

[17]  Raymond J. Mooney,et al.  Learning to Disambiguate Search Queries from Short Sessions , 2009, ECML/PKDD.

[18]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[19]  Guy Shani,et al.  An MDP-Based Recommender System , 2002, J. Mach. Learn. Res..

[20]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[21]  Ben Carterette,et al.  Probabilistic models of ranking novel documents for faceted topic retrieval , 2009, CIKM.

[22]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[23]  Xueqi Cheng,et al.  Directly Optimize Diversity Evaluation Measures , 2017, ACM Trans. Intell. Syst. Technol..

[24]  Grace Hui Yang,et al.  Win-win search: dual-agent stochastic game in session search , 2014, SIGIR.

[25]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[26]  Tetsuya Sakai,et al.  Evaluating Search Result Diversity using Intent Hierarchies , 2016, SIGIR.

[27]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[28]  Tetsuya Sakai,et al.  Search Result Diversification Based on Hierarchical Intents , 2015, CIKM.

[29]  Katja Hofmann,et al.  Information Retrieval manuscript No. (will be inserted by the editor) Balancing Exploration and Exploitation in Listwise and Pairwise Online Learning to Rank for Information Retrieval , 2022 .

[30]  W. Bruce Croft,et al.  Diversity by proportionality: an election-based approach to search result diversification , 2012, SIGIR '12.

[31]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[32]  Xueqi Cheng,et al.  Learning for search result diversification , 2014, SIGIR.

[33]  Jiafeng Guo,et al.  Reinforcement Learning to Rank with Markov Decision Process , 2017, SIGIR.

[34]  Qiang Yang,et al.  Partially Observable Markov Decision Process for Recommender Systems , 2016, ArXiv.

[35]  Craig MacDonald,et al.  Explicit Search Result Diversification through Sub-queries , 2010, ECIR.

[36]  Xueqi Cheng,et al.  Learning Maximal Marginal Relevance Model via Directly Optimizing Diversity Evaluation Measures , 2015, SIGIR.

[37]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[38]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[39]  Arjen P. de Vries,et al.  Combining implicit and explicit topic representations for result diversification , 2012, SIGIR '12.

[40]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.