A Unified Optimization Algorithm For Solving "Regret-Minimizing Representative" Problems

Given a database with numeric attributes, it is often of interest to rank the tuples according to linear scoring functions. For a scoring function and a subset of tuples, the regret of the subset is defined as the (relative) difference in scores between the top-1 tuple of the subset and the top-1 tuple of the entire database. Finding the regretratio minimizing set (RRMS), i.e., the subset of a required size k that minimizes the maximum regret-ratio across all possible ranking functions, has been a well-studied problem in recent years. This problem is known to be NP-complete and there are several approximation algorithms for it. Other NP-complete variants have also been investigated, e.g., finding the set of size k that minimizes the average regret ratio over all linear functions. Prior work have designed customized algorithms for different variants of the problem, and are unlikely to easily generalize to other variants. In this paper we take a different path towards tackling these problems. In contrast to the prior, we propose a unified algorithm for solving different problem variants. Unification is done by localizing the customization to the design of variant-specific subroutines or “oracles” that are called by our algorithm. Our unified algorithm takes inspiration from the seemingly unrelated problem of clustering from data mining, and the corresponding K-MEDOID algorithm. We make several innovative contributions in designing our algorithm, including various techniques such as linear programming, edge sampling in graphs, volume estimation of multi-dimensional convex polytopes, and several others. We provide rigorous theoretical analysis, as well as substantial experimental evaluations over real and synthetic data sets to demonstrate the practical feasibility of our approach. PVLDB Reference Format: Suraj Shetiya, Abolfazl Asudeh, Sadia Ahmed and Gautam Das. A Unified Optimization Algorithm For Solving “Regret-Minimizing Representative” Problems. PVLDB, 13(3): 239 251, 2019. DOI: https://doi.org/10.14778/3368289.3368291

[1]  Rémi Bardenet,et al.  Monte Carlo Methods , 2013, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[2]  Kazuhisa Makino,et al.  Interactive regret minimization , 2012, SIGMOD Conference.

[3]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[4]  Abolfazl Asudeh,et al.  Efficient Computation of Regret-ratio Minimizing Set: A Compact Maxima Representative , 2017, SIGMOD Conference.

[5]  Richard J. Lipton,et al.  Regret-minimizing representative databases , 2010, Proc. VLDB Endow..

[6]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[7]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[8]  Gang Chen,et al.  Efficient algorithms for finding the most desirable skyline objects , 2015, Knowl. Based Syst..

[9]  Subhash Suri,et al.  Efficient Algorithms for k-Regret Minimizing Sets , 2017, SEA.

[10]  Arnab Bhattacharya,et al.  SkyCover: Finding Range-Constrained Approximate Skylines with Bounded Quality Guarantees , 2016, COMAD.

[11]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[12]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[13]  Cheng Long,et al.  Efficient k-Regret Query Algorithm with Restriction-free Bound for any Dimensionality , 2018, SIGMOD Conference.

[14]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[15]  Abolfazl Asudeh,et al.  Query Reranking As A Service , 2016, Proc. VLDB Endow..

[16]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[17]  Jian Pei,et al.  Distance-Based Representative Skyline , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[18]  Nick Koudas,et al.  Efficient Computation of Subspace Skyline over Categorical Domains , 2017, CIKM.

[19]  Abolfazl Asudeh,et al.  Crowdsourcing Pareto-Optimal Object Finding By Pairwise Comparisons , 2014, CIKM.

[20]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[21]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[22]  Martin E. Dyer,et al.  On the Complexity of Computing the Volume of a Polyhedron , 1988, SIAM J. Comput..

[23]  Hua Lu,et al.  Flexible and Efficient Resolution of Skyline Query Size Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[24]  Raymond Chi-Wing Wong,et al.  Finding Average Regret Ratio Minimizing Set in Database , 2018, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[25]  George Valkanas,et al.  SkyDiver: a framework for skyline diversification , 2013, EDBT '13.

[26]  Abolfazl Asudeh,et al.  Designing Fair Ranking Schemes , 2017, SIGMOD Conference.

[27]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[28]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[29]  Abolfazl Asudeh,et al.  On Obtaining Stable Rankings , 2018, Proc. VLDB Endow..

[30]  Jiawei Han,et al.  Mining Thick Skylines over Large Databases , 2004, PKDD.

[31]  Alex Thomo,et al.  Computing k-Regret Minimizing Sets , 2014, Proc. VLDB Endow..

[32]  Yang Xiang,et al.  l-SkyDiv query: Effectively improve the usefulness of skylines , 2010, Science China Information Sciences.

[33]  Raymond Chi-Wing Wong,et al.  Minimizing Average Regret Ratio in Database , 2016, SIGMOD Conference.

[34]  Lubomír Kubáček,et al.  On a linearization of regression models , 1995 .

[35]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[36]  Jiangwei Pan,et al.  Near-Linear Algorithms for Geometric Hitting Sets and Set Covers , 2020, Discret. Comput. Geom..

[37]  Franco P. Preparata,et al.  Approximation algorithms for convex hulls , 1982, CACM.