Categorical skylines for streaming data

The problem of skyline computation has attracted considerable research attention. In the categorical domain the problem becomes more complicated, primarily due to the partially-ordered nature of the attributes of tuples. In this paper, we initiate a study of streaming categorical skylines. We identify the limitations of existing work for offline categorical skyline computation and realize novel techniques for the problem of maintaining the skyline of categorical data in a streaming environment. In particular, we develop a lightweight data structure for indexing the tuples in the streaming buffer, that can gracefully adapt to tuples with many attributes and partially ordered domains of any size and complexity. Additionally, our study of the dominance relation in the dual space allows us to utilize geometric arrangements in order to index the categorical skyline and efficiently evaluate dominance queries. Lastly, a thorough experimental study evaluates the efficiency of the proposed techniques.

[1]  Yufei Tao,et al.  Maintaining sliding window skylines on data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[3]  Ömer Egecioglu,et al.  DeltaSky: Optimal Maintenance of Skyline Deletions without Exclusive Dominance Region Generation , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[5]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[6]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[7]  Mark de Berg,et al.  Computational geometry: algorithms and applications, 3rd Edition , 1997 .

[8]  Jignesh M. Patel,et al.  Efficient Continuous Skyline Computation , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[10]  Micha Sharir,et al.  Arrangements and Their Applications , 2000, Handbook of Computational Geometry.

[11]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[12]  Kyriakos Mouratidis,et al.  Continuous monitoring of top-k queries over sliding windows , 2006, SIGMOD Conference.

[13]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[14]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[16]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Dimitrios Gunopulos,et al.  Ad-hoc Top-k Query Answering for Data Streams , 2007, VLDB.

[18]  Kian-Lee Tan,et al.  Stratified computation of skylines with partially-ordered domains , 2005, SIGMOD '05.

[19]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Alan M. Frieze,et al.  Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION , 1995, IPCO.

[21]  Anthony K. H. Tung,et al.  On High Dimensional Skylines , 2006, EDBT.

[22]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[23]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[24]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[25]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[26]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[27]  Beng Chin Ooi,et al.  Approximate NN queries on Streams with Guaranteed Error/performance Bounds , 2004, VLDB.