Continuous Adaptive Mining the Thin Skylines over Evolving Data Stream

Skyline queries, which return the objects that are better than or equal in all dimensions and better in at least one dimension, are useful in many decision making and real-time monitor applications. With the number of dimensions increasing and continuous large volume data arriving, mining the thin skylines over data stream under control of losing quality is a more meaningful problem. In this paper, firstly, we propose a novel concept, called thin skyline, which uses a skyline object that represents its nearby skyline neighbors within Ɛ-distance (acceptable difference). Then, two algorithms are developed which prunes the skyline objects within the acceptable difference and adopts correlation coefficient to adjust adaptively thin skyline query quality. Furthermore, our experimental performance study shows that the proposed methods are both efficient and effective.

[1]  H. T. Kung,et al.  On the Average Number of Maxima in a Set of Vectors and Applications , 1978, JACM.

[2]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[3]  Tian Xia,et al.  Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[4]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[5]  Theodore Johnson,et al.  The Gigascope Stream Database , 2003, IEEE Data Eng. Bull..

[6]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[8]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[9]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[10]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[11]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[12]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[13]  Dennis Shasha,et al.  The Virtues and Challenges of Ad Hoc + Streams Querying in Finance , 2003, IEEE Data Eng. Bull..

[14]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[15]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[16]  Xuemin Lin,et al.  A Scalable and I/O Optimal Skyline Processing Algorithm , 2004, WAIM.

[17]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[18]  Stéphane Bressan,et al.  Efficiency and Effectiveness of XML Tools and Techniques and Data Integration over the Web , 2003, Lecture Notes in Computer Science.

[19]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).