Diversity maximization in doubling metrics

Diversity maximization is an important geometric optimization problem with many applications in recommender systems, machine learning or search engines among others. A typical diversification problem is as follows: Given a finite metric space $(X,d)$ and a parameter $k \in \mathbb{N}$, find a subset of $k$ elements of $X$ that has maximum diversity. There are many functions that measure diversity. One of the most popular measures, called remote-clique, is the sum of the pairwise distances of the chosen elements. In this paper, we present novel results on three widely used diversity measures: Remote-clique, remote-star and remote-bipartition. Our main result are polynomial time approximation schemes for these three diversification problems under the assumption that the metric space is doubling. This setting has been discussed in the recent literature. The existence of such a PTAS however was left open. Our results also hold in the setting where the distances are raised to a fixed power $q\geq 1$, giving rise to more variants of diversity functions, similar in spirit to the variations of clustering problems depending on the power applied to the distances. Finally, we provide a proof of NP-hardness for remote-clique with squared distances in doubling metric spaces.

[1]  Friedrich Eisenbrand,et al.  Local Search for Max-Sum Diversification , 2017, SODA.

[2]  Piotr Indyk,et al.  Nearest-neighbor-preserving embeddings , 2007, TALG.

[3]  Zoubin Ghahramani,et al.  Linear dimensionality reduction: survey, insights, and generalizations , 2014, J. Mach. Learn. Res..

[4]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[5]  Philip N. Klein,et al.  Local Search Yields Approximation Schemes for k-Means and k-Median in Euclidean and Minor-Free Metrics , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[6]  Sanjeev Arora,et al.  Inapproximabilty of Densest κ-Subgraph from Average Case Hardness , 2011 .

[7]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[8]  Sándor P. Fekete,et al.  Maximum Dispersion and Geometric Maximum Weight Cliques , 2003, Algorithmica.

[9]  Lee-Ad Gottlieb,et al.  Dimension Reduction Techniques for ℓp (1 , 2016, Symposium on Computational Geometry.

[10]  Lee-Ad Gottlieb,et al.  A Nonlinear Approach to Dimension Reduction , 2011, SODA '11.

[11]  Friedrich Eisenbrand,et al.  Max-Sum Diversity Via Convex Programming , 2016, Symposium on Computational Geometry.

[12]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[13]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[14]  Karpinski Marek,et al.  A Polynomial Time Approximation Scheme for Metric MIN-BISECTION , 2002 .

[15]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[16]  Sariel Har-Peled Geometric Approximation Algorithms , 2011 .

[17]  Eli Upfal,et al.  MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension , 2017, Proc. VLDB Endow..

[18]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2007, Algorithmica.

[19]  D. W. Wang,et al.  A Study on Two Geometric Location Problems , 1988, Inf. Process. Lett..

[20]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[21]  Hamid Zarrabi-Zadeh,et al.  Diversity Maximization via Composable Coresets , 2015, CCCG.

[22]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[23]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR '06.

[24]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[25]  Leonard J. Schulman,et al.  Dimensionality reduction: beyond the Johnson-Lindenstrauss bound , 2011, SODA '11.

[26]  Nuno Vasconcelos,et al.  Feature Selection by Maximum Marginal Diversity , 2002, NIPS.

[27]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[28]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[29]  Aditya Bhaskara,et al.  Linear Relaxations for Finding Diverse Elements in Metric Spaces , 2016, NIPS.