Understanding the meaning of a shifted sky: a general framework on extending skyline query

Skyline queries are often used on data sets in multi-dimensional space for many decision-making applications. Traditionally, an object p is said to dominate another object q if, for all dimensions, it is no worse than q and is better on at least one dimension. Therefore, the skyline of a data set consists of all objects not dominated by any other object. To better cater to application requirements such as controlling the size of the skyline or handling data sets that are not well-structured, various works have been proposed to extend the definition of skyline based on variants of the dominance relationship. In view of the proliferation of variants, in this paper, a generalized framework is proposed to guide the extension of skyline query from conventional definition to different variants. Our framework explicitly and carefully examines the various properties that should be preserved in a variant of the dominance relationship so that: (1) maintaining original advantages, while extending adaptivity to application semantics, and (2) keeping computational complexity almost unaffected. We prove that traditional dominance is the only relationship satisfying all desirable properties, and present some new dominance relationships by relaxing some of the properties. These relationships are general enough for us to design new top-k skyline queries that return robust results of a controllable size. We analyze the existing skyline algorithms based on their minimum requirements on dominance properties. We also extend our analysis to data sets with missing values, and present extensive experimental results on the combinations of new dominance relationships and skyline algorithms.

[1]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Kenneth L. Clarkson,et al.  Fast linear expected-time algorithms for computing maxima and convex hulls , 1993, SODA '90.

[3]  Kian-Lee Tan,et al.  Stratified computation of skylines with partially-ordered domains , 2005, SIGMOD '05.

[4]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[5]  Anthony K. H. Tung,et al.  On domination game analysis for microeconomic data mining , 2009, TKDD.

[6]  Tharam S. Dillon,et al.  Tree model guided candidate generation for mining frequent subtrees from XML documents , 2008, TKDD.

[7]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[8]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[9]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[10]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[11]  Ömer Egecioglu,et al.  DeltaSky: Optimal Maintenance of Skyline Deletions without Exclusive Dominance Region Generation , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[13]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[14]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[15]  Werner Kießling,et al.  Foundations of Preferences in Database Systems , 2002, VLDB.

[16]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[17]  Vagelis Hristidis,et al.  Authority-based keyword search in databases , 2008, TODS.

[18]  Jignesh M. Patel,et al.  Efficient Skyline Computation over Low-Cardinality Domains , 2007, VLDB.

[19]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[20]  Mihalis Yannakakis,et al.  Small Approximate Pareto Sets for Bi-objective Shortest Paths and Other Problems , 2007, APPROX-RANDOM.

[21]  Raghu Ramakrishnan,et al.  When Is Nearest Neighbors Indexable? , 2005, ICDT.

[22]  Mihalis Yannakakis,et al.  Succinct approximate convex pareto curves , 2008, SODA '08.

[23]  Anthony K. H. Tung,et al.  Categorical skylines for streaming data , 2008, SIGMOD Conference.

[24]  Jan Chomicki,et al.  Preference formulas in relational queries , 2003, TODS.

[25]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[26]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.