Reaching the Top of the Skyline: An Efficient Indexed Algorithm for Top-k Skyline Queries

Criteria that induce a Skyline naturally represent user's preference conditions useful to discard irrelevant data in large datasets. However, in the presence of high-dimensional Skyline spaces, the size of the Skyline can still be very large, making unfeasible for users to process this set of points. To identify the best points among the Skyline, the Top-k Skyline approach has been proposed. Top-k Skyline uses discriminatory criteria to induce a total order of the points that comprise the Skyline, and recognizes the best or top-k objects based on these criteria. Different algorithms have been defined to compute the top-k objects among the Skyline; while existing solutions are able to produce the Top-k Skyline, they may be very costly. First, state-of-the-art Top-k Skyline solutions require the computation of the whole Skyline; second, they execute probes of the multicriteria function over the whole Skyline points. Thus, if k is much smaller than the cardinality of the Skyline, these solutions may be very inefficient because a large number of non-necessary probes may be evaluated. In this paper, we propose the TKSI, an efficient solution for the Top-k Skyline that overcomes existing solutions drawbacks. The TKSI is an index-based algorithm that is able to compute only the subset of the Skyline that will be required to produce the top-k objects; thus, the TKSI is able to minimize the number of non-necessary probes. We have empirically studied the quality of TKSI, and we report initial experimental results that show the TKSI is able to speed up the computation of the Top-k Skyline in at least 50% percent w.r.t. the state-of-the-art solutions, when k is smaller than the size of the Skyline.

[1]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[2]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[3]  Mukesh K. Mohania,et al.  Advances in Databases: Concepts, Systems and Applications , 2007 .

[4]  Marlene Goncalves,et al.  Top-k Skyline: A Unified Approach , 2005, OTM Workshops.

[5]  Wenfei Fan,et al.  Keys with Upward Wildcards for XML , 2001, DEXA.

[6]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[7]  Seung-won Hwang,et al.  Telescope: Zooming to Interesting Skylines , 2007, DASFAA.

[8]  Marlene Goncalves,et al.  Evaluating Top-k Skyline Queries over Relational Databases , 2007, DEXA.

[9]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Jian Pei,et al.  Efficient Skyline and Top-k Retrieval in Subspaces , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[12]  Torsten Grust,et al.  Advances in database technology - EDBT 2006 : 10th International Conference on Extending Database Technology, Munich, Germany, March 2006; proceedings , 2006 .

[13]  Marlene Goncalves,et al.  Preferred Skyline: A Hybrid Approach Between SQLf and Skyline , 2005, DEXA.

[14]  Seung-won Hwang,et al.  Optimizing access cost for top-k queries over Web sources: a unified cost-based approach , 2005, 21st International Conference on Data Engineering (ICDE'05).

[15]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[16]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[17]  Wolf-Tilo Balke,et al.  Multi-objective Query Processing for Database Systems , 2004, VLDB.

[18]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[19]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[20]  Jan Chomicki,et al.  Hippo: A System for Computing Consistent Answers to a Class of SQL Queries , 2004, EDBT.

[21]  David Wai-Lok Cheung,et al.  Progressive skylining over Web-accessible databases , 2006, Data Knowl. Eng..