Local Search for Max-Sum Diversification

We provide simple and fast polynomial time approximation schemes (PTASs) for several variants of the max-sum diversification problem which, in its most basic form, is as follows: Given n points p_1,...,p_n in R^d and an integer k, select k points such that the average Euclidean distance between these points is maximized. This problem commonly appears in information retrieval and web-search in order to select a diverse set of points from the input. In this context, it has recently received a lot of attention. We present new techniques to analyze natural local search algorithms. This leads to a (1-O(1/k))-approximation for distances of negative type, even subject to any matroid constraint of rank k, in time O(n k^2 log k), when assuming that distance evaluations and calls to the independence oracle are constant time. Negative type distances include as special cases Euclidean distances and many further natural distances. Our result easily transforms into a PTAS and improves on the only previously known PTAS for this setting, which relies on convex optimization techniques in an n-dimensional space and is impractical for large data sets. In contrast, our procedure has an (optimal) linear dependence on n. Using generalized exchange properties of matroid intersection, we show that a PTAS can be obtained for matroid intersection constraints as well. Moreover, our techniques, being based on local search, are conceptually simple and allow for various extensions. In particular, we get asymptotically optimal O(1)-approximations when combining the classic dispersion function with a monotone submodular objective, which is a very common class of functions to measure diversity and relevance. This result leverages recent advances on local search techniques based on proxy functions to obtain optimal approximations for monotone submodular function maximization subject to a matroid constraint.

[1]  Jan Vondrák,et al.  Dependent Randomized Rounding via Exchange Properties of Combinatorial Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  H. Maehara,et al.  Metric transforms and Euclidean embeddings , 1990 .

[3]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[4]  I. J. Schoenberg Metric spaces and positive definite functions , 1938 .

[5]  Jan Vondrák,et al.  Submodular Maximization over Multiple Matroids via Generalized Exchange Properties , 2009, APPROX-RANDOM.

[6]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2006, APPROX-RANDOM.

[7]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[8]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[9]  Friedrich Eisenbrand,et al.  Max-Sum Diversity Via Convex Programming , 2016, Symposium on Computational Geometry.

[10]  Kamesh Munagala,et al.  Consideration set generation in commerce search , 2011, WWW.

[11]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[12]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[13]  R. Brualdi Comments on bases in dependence structures , 1969, Bulletin of the Australian Mathematical Society.

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Laurence A. Wolsey,et al.  Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..

[16]  Eli Upfal,et al.  MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension , 2017, Proc. VLDB Endow..

[17]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[18]  Sanjeev Arora,et al.  Inapproximabilty of Densest κ-Subgraph from Average Case Hardness , 2011 .

[19]  Yuval Filmus,et al.  Monotone Submodular Maximization over a Matroid via Non-Oblivious Local Search , 2014, SIAM J. Comput..

[20]  Sándor P. Fekete,et al.  Maximum dispersion and geometric maximum weight cliques , 2000, APPROX.

[21]  Aleksandar Nikolov Randomized Rounding for the Largest Simplex Problem , 2015, STOC.

[22]  Jan Vondrák,et al.  Multi-budgeted matchings and matroid intersection via dependent rounding , 2011, SODA '11.

[23]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[24]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[25]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[26]  Gérard Cornuéjols,et al.  Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado-Edmonds theorem , 1984, Discret. Appl. Math..

[27]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[28]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[29]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[30]  Jan Vondrák,et al.  Optimal approximation for submodular and supermodular optimization with bounded curvature , 2015, SODA.