Diversity maximization under matroid constraints

Aggregator websites typically present documents in the form of representative clusters. In order for users to get a broader perspective, it is important to deliver a diversified set of representative documents in those clusters. One approach to diversification is to maximize the average dissimilarity among documents. Another way to capture diversity is to avoid showing several documents from the same category (e.g. from the same news channel). We combine the above two diversification concepts by modeling the latter approach as a (partition) matroid constraint, and study diversity maximization problems under matroid constraints. We present the first constant-factor approximation algorithm for this problem, using a new technique. Our local search 0.5-approximation algorithm is also the first constant-factor approximation for the max-dispersion problem under matroid constraints. Our combinatorial proof technique for maximizing diversity under matroid constraints uses the existence of a family of Latin squares which may also be of independent interest. In order to apply these diversity maximization algorithms in the context of aggregator websites and as a preprocessing step for our diversity maximization tool, we develop greedy clustering algorithms that maximize weighted coverage of a predefined set of topics. Our algorithms are based on computing a set of cluster centers, where clusters are formed around them. We show the better performance of our algorithms for diversity and coverage maximization by running experiments on real (Twitter) and synthetic data in the context of real-time search over micro-posts. Finally we perform a user study validating our algorithms and diversity metrics.

[1]  Refael Hassin,et al.  Approximation algorithms for maximum dispersion , 1997, Oper. Res. Lett..

[2]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[3]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[4]  Sihem Amer-Yahia,et al.  Efficient Computation of Diverse Query Results , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Tao Li,et al.  Addressing diverse user preferences in SQL-query-result navigation , 2007, SIGMOD '07.

[6]  Sean M. McNee,et al.  Being accurate is not enough: how accuracy metrics have hurt recommender systems , 2006, CHI Extended Abstracts.

[7]  Georgia Koutrika,et al.  Précis: The Essence of a Query Answer , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[8]  Debmalya Panigrahi,et al.  Online selection of diverse results , 2012, WSDM '12.

[9]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[10]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[11]  Kamesh Munagala,et al.  Consideration set generation in commerce search , 2011, WWW.

[12]  Samir Khuller,et al.  Algorithms for facility location problems with outliers , 2001, SODA '01.

[13]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[14]  Anthony J. W. Hilton,et al.  On Double Diagonal and Cross Latin Squares , 1973 .

[15]  Andreas Krause,et al.  Online Learning of Assignments , 2009, NIPS.

[16]  Joseph A. Konstan,et al.  Introduction to recommender systems , 2008, SIGMOD Conference.

[17]  Craig MacDonald,et al.  Exploiting query reformulations for web search result diversification , 2010, WWW '10.

[18]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.

[19]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[20]  A. Schrijver A Course in Combinatorial Optimization , 1990 .

[21]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[22]  Tie-Yan Liu,et al.  Actively predicting diverse search intent from user browsing behaviors , 2010, WWW '10.

[23]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[24]  Bruce E. Wampold,et al.  Research Design in Counseling , 1991 .

[25]  Krishna Bharat,et al.  Diversifying web search results , 2010, WWW '10.

[26]  Recommendation Diversification Using Explanations , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[27]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[28]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[29]  Jiawei Han,et al.  Extracting redundancy-aware top-k patterns , 2006, KDD '06.

[30]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[31]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[32]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[33]  Filip Radlinski,et al.  Improving personalized web search using result diversification , 2006, SIGIR.