Computing k-Regret Minimizing Sets

Regret minimizing sets are a recent approach to representing a dataset D by a small subset R of size r of representative data points. The set R is chosen such that executing any top-1 query on R rather than D is minimally perceptible to any user. However, such a subset R may not exist, even for modest sizes, r. In this paper, we introduce the relaxation to k-regret minimizing sets, whereby a top-1 query on R returns a result imperceptibly close to the top-k on D. We show that, in general, with or without the relaxation, this problem is NP-hard. For the specific case of two dimensions, we give an efficient dynamic programming, plane sweep algorithm based on geometric duality to find an optimal solution. For arbitrary dimension, we give an empirically effective, greedy, randomized algorithm based on linear programming. With these algorithms, we can find subsets R of much smaller size that better summarize D, using small values of k larger than 1.

[1]  W. Marsden I and J , 2012 .

[2]  Anthony K. H. Tung,et al.  Discovering strong skyline points in high dimensional spaces , 2005, CIKM '05.

[3]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[4]  Richard J. Lipton,et al.  Representative skylines using threshold-based preference distributions , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[6]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[7]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[8]  Christos Doulkeridis,et al.  Monochromatic and Bichromatic Reverse Top-k Queries , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[10]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[12]  Timothy M. Chan,et al.  On levels in arrangements of lines , 1998 .

[13]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[14]  Dimitrios Gunopulos,et al.  Ad-hoc Top-k Query Answering for Data Streams , 2007, VLDB.

[15]  Kazuhisa Makino,et al.  Interactive regret minimization , 2012, SIGMOD Conference.

[16]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[17]  Divesh Srivastava,et al.  Ranked join indices , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[18]  Lei Zou,et al.  Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[19]  Seung-won Hwang,et al.  Personalized top-k skyline queries in high-dimensional space , 2009, Inf. Syst..

[20]  Christos Doulkeridis,et al.  Discovering Representative Skyline Points over Distributed Data , 2012, SSDBM.

[21]  Leonidas J. Guibas,et al.  Topologically sweeping an arrangement , 1986, STOC '86.

[22]  Hua Lu,et al.  Flexible and Efficient Resolution of Skyline Query Size Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[23]  Seung-won Hwang,et al.  Efficient Dual-Resolution Layer Indexing for Top-k Queries , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[24]  Alex Thomo,et al.  Indexing Reverse Top-k Queries in Two Dimensions , 2013, DASFAA.

[25]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[26]  Micha Sharir,et al.  On levels in arrangements of lines, segments, planes, and triangles , 1997, SCG '97.