Finding Diverse, High-Value Representatives on a Surface of Answers

In many applications, the system needs to selectively present a small subset of answers to users. The set of all possible answers can be seen as an elevation surface over a domain, where the elevation measures the quality of each answer, and the dimensions of the domain correspond to attributes of the answers with which similarity between answers can be measured. This paper considers the problem of finding a diverse set of k high-quality representatives for such a surface. We show that existing methods for diversified top-k and weighted clustering problems are inadequate for this problem. We propose k-DHR as a better formulation for the problem. We show that k-DHR has a submodular and monotone objective function, and we develop efficient algorithms for solving k-DHR with provable guarantees. We conduct extensive experiments to demonstrate the usefulness of the results produced by k-DHR for applications in computational lead-finding and fact-checking, as well as the efficiency and effectiveness of our algorithms.

[1]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[2]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[3]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[4]  Yi-Cheng Zhang,et al.  Solving the apparent diversity-accuracy dilemma of recommender systems , 2008, Proceedings of the National Academy of Sciences.

[5]  Vagelis Hristidis,et al.  User effort minimization through adaptive diversification , 2014, KDD.

[6]  Simina Brânzei,et al.  Weighted Clustering , 2011, AAAI.

[7]  Jayant R. Haritsa,et al.  Providing Diversity in K-Nearest Neighbor Query Results , 2003, PAKDD.

[8]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[9]  Jayant R. Haritsa The KNDN Problem: A Quest for Unity in Diversity , 2009, IEEE Data Eng. Bull..

[10]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[11]  Davide Martinenghi,et al.  Top-k diversity queries over bounded regions , 2013, TODS.

[12]  Alfred M. Bruckstein,et al.  Finding Shortest Paths on Surfaces Using Level Sets Propagation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Nimrod Megiddo,et al.  On the Complexity of Some Common Geometric Location Problems , 1984, SIAM J. Comput..

[14]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  J A Sethian,et al.  Computing geodesic paths on manifolds. , 1998, Proceedings of the National Academy of Sciences of the United States of America.