Max-Sum diversification, monotone submodular functions and dynamic updates

Result diversification has many important applications in databases, operations research, information retrieval, and finance. In this paper, we study and extend a particular version of result diversification, known as max-sum diversification. More specifically, we consider the setting where we are given a set of elements in a metric space and a set valuation function f defined on every subset. For any given subset S, the overall objective is a linear combination of f(S) and the sum of the distances induced by S. The goal is to find a subset S satisfying some constraints that maximizes the overall objective. This problem is first studied by Gollapudi and Sharma in [17] for modular set functions and for sets satisfying a cardinality constraint (uniform matroids). In their paper, they give a 2-approximation algorithm by reducing to an earlier result in [20]. The first part of this paper considers an extension of the modular case to the monotone submodular case, for which the algorithm in [17] no longer applies. Interestingly, we are able to maintain the same 2-approximation using a natural, but different greedy algorithm. We then further extend the problem by considering any matroid constraint and show that a natural single swap local search algorithm provides a 2-approximation in this more general setting. This extends the Nemhauser, Wolsey and Fisher approximation result [20] for the problem of submodular function maximization subject to a matroid constraint (without the distance function component). The second part of the paper focuses on dynamic updates for the modular case. Suppose we have a good initial approximate solution and then there is a single weight-perturbation either on the valuation of an element or on the distance between two elements. Given that users expect some stability in the results they see, we ask how easy is it to maintain a good approximation without significantly changing the initial set. We measure this by the number of updates, where each update is a swap of a single element in the current solution with a single element outside the current solution. We show that we can maintain an approximation ratio of 3 by just a single update if the perturbation is not too large.

[1]  Jack Edmonds,et al.  Matroids and the greedy algorithm , 1971, Math. Program..

[2]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[3]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[4]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[5]  Allan Borodin,et al.  Weakly Submodular Functions , 2014, ArXiv.

[6]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[7]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[8]  M. Kuby Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems , 2010 .

[9]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[10]  Craig MacDonald,et al.  Intent-aware search result diversification , 2011, SIGIR.

[11]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[12]  Sándor P. Fekete,et al.  Maximum Dispersion and Geometric Maximum Weight Cliques , 2003, Algorithmica.

[13]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[14]  Subhash Khot,et al.  Ruling out PTAS for graph min-bisection, densest subgraph and bipartite clique , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[15]  Wolfgang Nejdl,et al.  Incremental diversification for very large sets: a streaming-based approach , 2011, SIGIR '11.

[16]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[17]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[18]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[19]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[20]  D. W. Wang,et al.  A Study on Two Geometric Location Problems , 1988, Information Processing Letters.

[21]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[22]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[23]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[24]  Xiaojin Zhu,et al.  Improving Diversity in Ranking using Absorbing Random Walks , 2007, NAACL.

[25]  Divesh Srivastava,et al.  DivDB: A System for Diversifying Query Results , 2011, Proc. VLDB Endow..

[26]  Marcin Sydow,et al.  Improved Approximation Guarantee for Max Sum Diversification with Parameterised Triangle Inequality , 2014, ISMIS.

[27]  Tao Qin,et al.  LETOR: A benchmark collection for research on learning to rank for information retrieval , 2010, Information Retrieval.

[28]  Anthony K. H. Tung,et al.  BROAD: Diversified Keyword Search in Databases , 2011, Proc. VLDB Endow..

[29]  E. Erkut The discrete p-dispersion problem , 1990 .

[30]  R. Rado Note on Independence Functions , 1957 .

[31]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[32]  Erhan Erkut,et al.  Analytical models for locating undesirable facilities , 1989 .

[33]  Joseph Naor,et al.  Approximation Algorithms for Diversified Search Ranking , 2010, ICALP.

[34]  Takeshi Tokuyama,et al.  Finding subsets maximizing minimum structures , 1995, SODA '95.

[35]  Hui Lin,et al.  Multi-document Summarization via Budgeted Maximization of Submodular Functions , 2010, NAACL.

[36]  Cong Yu,et al.  It takes variety to make a world: diversification in recommender systems , 2009, EDBT '09.

[37]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[38]  Evaggelia Pitoura,et al.  Diversity over Continuous Data , 2009, IEEE Data Eng. Bull..

[39]  Ji-Rong Wen,et al.  Multi-dimensional search result diversification , 2011, WSDM '11.

[40]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2007, Algorithmica.

[41]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[42]  Filip Radlinski,et al.  Learning optimally diverse rankings over large document collections , 2010, ICML.

[43]  Thorsten Joachims,et al.  Dynamic ranked retrieval , 2011, WSDM '11.

[44]  Maxim Sviridenko,et al.  A note on maximizing a submodular set function subject to a knapsack constraint , 2004, Oper. Res. Lett..

[45]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[46]  David R. Karger,et al.  Less is More Probabilistic Models for Retrieving Fewer Relevant Documents , 2006 .

[47]  Yi Chen,et al.  Structured Search Result Differentiation , 2009, Proc. VLDB Endow..

[48]  R. Brualdi Comments on bases in dependence structures , 1969, Bulletin of the Australian Mathematical Society.

[49]  Andrzej Czygrinow Maximum dispersion problem in dense graphs , 2000, Oper. Res. Lett..

[50]  Barun Chandra,et al.  Facility Dispersion and Remote Subgraphs , 1995, SWAT.

[51]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[52]  Rajeev Motwani,et al.  On syntactic versus computational views of approximability , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.