Max-Sum Diversity Via Convex Programming

Diversity maximization is an important concept in information retrieval, computational geometry and operations research. Usually, it is a variant of the following problem: Given a ground set, constraints, and a function $f(\cdot)$ that measures diversity of a subset, the task is to select a feasible subset $S$ such that $f(S)$ is maximized. The \emph{sum-dispersion} function $f(S) = \sum_{x,y \in S} d(x,y)$, which is the sum of the pairwise distances in $S$, is in this context a prominent diversification measure. The corresponding diversity maximization is the \emph{max-sum} or \emph{sum-sum diversification}. Many recent results deal with the design of constant-factor approximation algorithms of diversification problems involving sum-dispersion function under a matroid constraint. In this paper, we present a PTAS for the max-sum diversification problem under a matroid constraint for distances $d(\cdot,\cdot)$ of \emph{negative type}. Distances of negative type are, for example, metric distances stemming from the $\ell_2$ and $\ell_1$ norm, as well as the cosine or spherical, or Jaccard distance which are popular similarity metrics in web and image search.

[1]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[2]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[3]  Jan Vondrák,et al.  Dependent Randomized Rounding via Exchange Properties of Combinatorial Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[4]  H. Maehara,et al.  Metric transforms and Euclidean embeddings , 1990 .

[5]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[6]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[7]  Maxim Sviridenko,et al.  Pipage Rounding: A New Method of Constructing Algorithms with Proven Performance Guarantee , 2004, J. Comb. Optim..

[8]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[9]  L. Khachiyan,et al.  The polynomial solvability of convex quadratic programming , 1980 .

[10]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.


[14]  Leonard M. Blumenthal,et al.  Theory and applications of distance geometry , 1954 .

[15]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[16]  Jan Vondrák,et al.  Maximizing a Monotone Submodular Function Subject to a Matroid Constraint , 2011, SIAM J. Comput..

[17]  Sándor P. Fekete,et al.  Approximation of Geometric Dispersion Problems , 2001, Algorithmica.

[18]  M. R. Rao,et al.  Combinatorial Optimization , 1997, The Computer Science and Engineering Handbook.

[19]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[20]  Kamesh Munagala,et al.  Consideration set generation in commerce search , 2011, WWW.

[21]  Sanjeev Arora,et al.  Inapproximabilty of Densest κ-Subgraph from Average Case Hardness , 2011 .

[22]  M. Fréchet Les dimensions d'un ensemble abstrait , 1910 .

[23]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[24]  Sándor P. Fekete,et al.  Maximum Dispersion and Geometric Maximum Weight Cliques , 2003, Algorithmica.

[25]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[26]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[27]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[28]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[29]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[30]  I. J. Schoenberg Metric spaces and positive definite functions , 1938 .

[31]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2007, Algorithmica.

[32]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[33]  Maxim Sviridenko,et al.  Concentration inequalities for nonlinear matroid intersection , 2015, Random Struct. Algorithms.

[34]  Kai Li,et al.  Image similarity search with compact data structures , 2004, CIKM '04.

[35]  J. G. Pierce,et al.  Geometric Algorithms and Combinatorial Optimization , 1989 .

[36]  Martin Skutella,et al.  Convex quadratic and semidefinite programming relaxations in scheduling , 2001, JACM.

[37]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2006, APPROX-RANDOM.