Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality

Extracting interesting tuples from a large database is an important problem in multi-criteria decision making. Two representative queries were proposed in the literature: top- k queries and skyline queries. A top- k query requires users to specify their utility functions beforehand and then returns k tuples to the users. A skyline query does not require any utility function from users but it puts no control on the number of tuples returned to users. Recently, a k-regret query was proposed and received attention from the community because it does not require any utility function from users and the output size is controllable, and thus it avoids those deficiencies of top- k queries and skyline queries. Specifically, it returns k tuples that minimize a criterion called the maximum regret ratio . In this paper, we present the lower bound of the maximum regret ratio for the k -regret query. Besides, we propose a novel algorithm, called SPHERE, whose upper bound on the maximum regret ratio is asymptotically optimal and restriction-free for any dimensionality, the best-known result in the literature. We conducted extensive experiments to show that SPHERE performs better than the state-of-the-art methods for the k -regret query.

[1]  Marlene Goncalves,et al.  Top-k Skyline: A Unified Approach , 2005, OTM Workshops.

[2]  Jan Chomicki,et al.  Discovering Relative Importance of Skyline Attributes , 2009, Proc. VLDB Endow..

[3]  Pankaj K. Agarwal,et al.  Approximating extent measures of points , 2004, JACM.

[4]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[5]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[6]  Subhash Suri,et al.  Efficient Algorithms for k-Regret Minimizing Sets , 2017, SEA.

[7]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[9]  Bernd Gärtner A Subexponential Algorithm for Abstract Optimization Problems , 1992, FOCS.

[10]  Yufei Tao,et al.  On Skylining with Flexible Dominance Relation , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Xiang Lian,et al.  Top-k dominating queries in uncertain databases , 2009, EDBT '09.

[12]  Raymond Chi-Wing Wong,et al.  Geometry approach for k-regret query , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[13]  Ashwin Lall,et al.  k-Regret Queries with Nonlinear Utilities , 2015, Proc. VLDB Endow..

[14]  Davide Martinenghi,et al.  Top-k bounded diversification , 2012, SIGMOD Conference.

[15]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[16]  Seung-won Hwang,et al.  Personalized top-k skyline queries in high-dimensional space , 2009, Inf. Syst..

[17]  Raymond Chi-Wing Wong,et al.  k-Regret Minimizing Set: Efficient Algorithms and Hardness , 2017, ICDT.

[18]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[19]  Ganesh Venkataraman,et al.  Personalized Job Recommendation System at LinkedIn: Practical Challenges and Lessons Learned , 2017, RecSys.

[20]  Kazuhisa Makino,et al.  Interactive regret minimization , 2012, SIGMOD Conference.

[21]  Abolfazl Asudeh,et al.  Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative , 2017, SIGMOD Conference.

[22]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Alex Thomo,et al.  Computing k-Regret Minimizing Sets , 2014, Proc. VLDB Endow..

[24]  M. Lephoto Information Retrieval Techniques and Applications , 2017 .

[25]  Man Lung Yiu,et al.  Multi-dimensional top-k dominating queries , 2009, The VLDB Journal.

[26]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[28]  Micha Sharir,et al.  A Combinatorial Bound for Linear Programming and Related Problems , 1992, STACS.

[29]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[31]  Yannis Manolopoulos,et al.  Domination Mining and Querying , 2007, DaWaK.

[32]  John R. Smith,et al.  The onion technique: indexing for linear optimization queries , 2000, SIGMOD '00.

[33]  Pankaj K. Agarwal,et al.  Practical Methods for Shape Fitting and Kinetic Data Structures using Coresets , 2004, SCG '04.

[34]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.