Diversity Maximization via Composable Coresets

Given a set S of points in a metric space, and a diversity measure div( ) dened over subsets of S, the goal of the diversity maximization problem is to nd a subset T S of size k that maximizes div(T ). Motivated by applications in massive data processing, we consider the composable coreset framework in which a coreset for a diversity measure is called -composable, if for any collection of sets and their corresponding coresets, the maximum diversity of the union of the coresets approximates the maximum diversity of the union of the sets. We present composable coresets with near-optimal approximation factors for several notions of diversity, including remote-clique, remote-cycle, and remote-tree. We also prove a general lower bound on the approximation factor of composable coresets for a large class of diversity maximization problems.

[1]  Takeshi Tokuyama,et al.  Finding subsets maximizing minimum structures , 1995, SODA '95.

[2]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[3]  Graham Cormode,et al.  Mergeable summaries , 2012, PODS '12.

[4]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[5]  Timothy M. Chan Faster core-set constructions and data stream algorithms in fixed dimensions , 2004, SCG '04.

[6]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[7]  S. S. Ravi,et al.  Heuristic and Special Case Algorithms for Dispersion Problems , 1994, Oper. Res..

[8]  M. Kuby Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems , 2010 .

[9]  Sándor P. Fekete,et al.  Maximum Dispersion and Geometric Maximum Weight Cliques , 2003, Algorithmica.

[10]  Sariel Har-Peled,et al.  On coresets for k-means and k-median clustering , 2004, STOC '04.

[11]  Vahab S. Mirrokni,et al.  Composable core-sets for diversity and coverage maximization , 2014, PODS.

[12]  I. Moon,et al.  An Analysis of Network Location Problems with Distance Constraints , 1984 .

[13]  Hamid Zarrabi-Zadeh An Almost Space-Optimal Streaming Algorithm for Coresets in Fixed Dimensions , 2010, Algorithmica.

[14]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[15]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[16]  Sándor P. Fekete,et al.  Approximation of Geometric Dispersion Problems , 1998, Algorithmica.

[17]  Benjamin E. Birnbaum,et al.  An Improved Analysis for a Greedy Remote-Clique Algorithm Using Factor-Revealing LPs , 2007, Algorithmica.

[18]  Hamid Zarrabi-Zadeh,et al.  Core-Preserving Algorithms , 2008, CCCG.

[19]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[20]  Andrzej Czygrinow Maximum dispersion problem in dense graphs , 2000, Oper. Res. Lett..

[21]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[22]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[23]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[24]  Uriel Feige,et al.  The Dense k -Subgraph Problem , 2001, Algorithmica.

[25]  Sihem Amer-Yahia,et al.  Diverse near neighbor problem , 2013, SoCG '13.