Linear Relaxations for Finding Diverse Elements in Metric Spaces

Choosing a diverse subset of a large collection of points in a metric space is a fundamental problem, with applications in feature selection, recommender systems, web search, data summarization, etc. Various notions of diversity have been proposed, tailored to different applications. The general algorithmic goal is to find a subset of points that maximize diversity, while obeying a cardinality (or more generally, matroid) constraint. The goal of this paper is to develop a novel linear programming (LP) framework that allows us to design approximation algorithms for such problems. We study an objective known as {\em sum-min} diversity, which is known to be effective in many applications, and give the first constant factor approximation algorithm. Our LP framework allows us to easily incorporate additional constraints, as well as secondary objectives. We also prove a hardness result for two natural diversity objectives, under the so-called {\em planted clique} assumption. Finally, we study the empirical performance of our algorithm on several standard datasets. We first study the approximation quality of the algorithm by comparing with the LP objective. Then, we compare the quality of the solutions produced by our method with other popular diversity maximization algorithms.

[1]  Nuno Vasconcelos Feature selection by maximum marginal diversity: optimality and implications for visual recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[2]  Vahab S. Mirrokni,et al.  Diversity maximization under matroid constraints , 2013, KDD.

[3]  Thorsten Meinl,et al.  Maximum-Score Diversity Selection for Early Drug Discovery , 2011, J. Chem. Inf. Model..

[4]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[5]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[6]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[7]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[8]  Éva Tardos,et al.  Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[9]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[10]  Thomas S. Huang,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation. , 2011, IEEE transactions on pattern analysis and machine intelligence.

[11]  Kamesh Munagala,et al.  Consideration set generation in commerce search , 2011, WWW.

[12]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[13]  Barun Chandra,et al.  Approximation Algorithms for Dispersion Problems , 2001, J. Algorithms.

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[17]  M. R. Rao,et al.  Combinatorial Optimization , 1992, NATO ASI Series.

[18]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[19]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.

[20]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[21]  Jan Vondrák,et al.  Dependent Randomized Rounding via Exchange Properties of Combinatorial Structures , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[22]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..