Stabbing the sky: efficient skyline computation over sliding windows

We consider the problem of efficiently computing the skyline against the most recent N elements in a data stream seen so far. Specifically, we study the n-of-N skyline queries; that is, computing the skyline for the most recent n (/spl forall/n/spl les/N) elements. Firstly, we developed an effective pruning technique to minimize the number of elements to be kept. It can be shown that on average storing only O(log/sup d/ N) elements from the most recent N elements is sufficient to support the precise computation of all n-of-N skyline queries in a d-dimension space if the data distribution on each dimension is independent. Then, a novel encoding scheme is proposed, together with efficient update techniques, for the stored elements, so that computing an n-of-N skyline query in a d-dimension space takes O(log N+s) time that is reduced to O(d log log N+s) if the data distribution is independent, where s is the number of skyline points. Thirdly, a novel trigger based technique is provided to process continuous n-of-N skyline queries with O(/spl delta/) time to update the current result per new data element and O(log s) time to update the trigger list per result change, where /spl delta/ is the number of element changes from the current result to the new result. Finally, we extend our techniques to computing the skyline against an arbitrary window in the most recent N element. Besides theoretical performance guarantees, our extensive experiments demonstrated that the new techniques can support on-line skyline query computation over very rapid data streams.

[1]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[2]  Bernard Chazelle,et al.  Linear space data structures for two types of range search , 1987, Discret. Comput. Geom..

[3]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[4]  Vagelis Hristidis,et al.  PREFER: a system for the efficient execution of multi-parametric ranked queries , 2001, SIGMOD '01.

[5]  Timos K. Sellis,et al.  Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[6]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[8]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[9]  Sanjiv Kapoor Dynamic Maintenance of Maxima of 2-d Point Sets , 2000, SIAM J. Comput..

[10]  Kurt Mehlhorn,et al.  Multi-dimensional searching and computational geometry , 1984 .

[11]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[12]  Hongjun Lu,et al.  Continuously maintaining quantile summaries of the most recent N elements over a data stream , 2004, Proceedings. 20th International Conference on Data Engineering.

[13]  Ralph E. Steuer Multiple criteria optimization , 1986 .

[14]  Christos Makris,et al.  Algorithms for Three-Dimensional Dominance Searching in Linear Space , 1998, Inf. Process. Lett..

[15]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[16]  Beng Chin Ooi,et al.  Approximate NN queries on Streams with Guaranteed Error/performance Bounds , 2004, VLDB.

[17]  Prof. Dr. Kurt Mehlhorn,et al.  Data Structures and Algorithms 3 , 2012, EATCS Monographs on Theoretical Computer Science.

[18]  D. H. McLain,et al.  Drawing Contours from Arbitrary Data Points , 1974, Comput. J..

[19]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[20]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[21]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[22]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[23]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[24]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[25]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[26]  Christian Buchta,et al.  On the Average Number of Maxima in a Set of Vectors , 1989, Inf. Process. Lett..

[27]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[28]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[29]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[30]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[31]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[32]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.