On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.

[1]  Kian-Lee Tan,et al.  Stratified computation of skylines with partially-ordered domains , 2005, SIGMOD '05.

[2]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[3]  Jian Pei,et al.  SUBSKY: Efficient Computation of Skylines in Subspaces , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.

[5]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[6]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[7]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[8]  Anthony K. H. Tung,et al.  DADA: a data cube for dominant relationship analysis , 2006, SIGMOD Conference.

[9]  Tian Xia,et al.  Refreshing the sky: the compressed skycube with efficient support for frequent updates , 2006, SIGMOD Conference.

[10]  Anthony K. H. Tung,et al.  Discovering strong skyline points in high dimensional spaces , 2005, CIKM '05.

[11]  Anthony K. H. Tung,et al.  Localized signature table: fast similarity search on transaction data , 2004, CIKM '04.

[12]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[13]  Hongjun Lu,et al.  Stabbing the sky: efficient skyline computation over sliding windows , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Wolf-Tilo Balke,et al.  Multi-objective Query Processing for Database Systems , 2004, VLDB.

[15]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[16]  Philip S. Yu,et al.  A new method for similarity indexing of market basket data , 1999, SIGMOD '99.

[17]  Jon M. Kleinberg,et al.  A Microeconomic View of Data Mining , 1998, Data Mining and Knowledge Discovery.

[18]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[19]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[20]  D J Hand REVIEW OF DATA MINING , 1998 .

[21]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.