Processing relaxed skylines in PDMS using distributed data summaries

Peer Data Management Systems (PDMS) are a natural extension of heterogeneous database systems. One of the main tasks in such systems is efficient query processing. Insisting on complete answers, however, leads to asking almost every peer in the network. Relaxing these completeness requirements by applying approximate query answering techniques can significantly reduce costs. Since most users are not interested in the exact answers to their queries, rank-aware query operators like top-k or skyline play an important role in query processing. In this paper, we present the novel concept of relaxed skylines that combines the advantages of both rank-aware query operators and approximate query processing techniques. Furthermore, we propose a strategy for processing relaxed skylines in distributed environments that allows for giving guarantees for the completeness of the result using distributed data summaries as routing indexes.

[1]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[2]  Gerhard Weikum,et al.  Top-k Query Evaluation with Probabilistic Guarantees , 2004, VLDB.

[3]  Hector Garcia-Molina,et al.  Routing indices for peer-to-peer systems , 2002, Proceedings 22nd International Conference on Distributed Computing Systems.

[4]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[5]  Jiawei Han,et al.  Mining Thick Skylines over Large Databases , 2004, PKDD.

[6]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[7]  Yannis E. Ioannidis,et al.  Approximate Query Answering using Histograms , 1999, IEEE Data Eng. Bull..

[8]  Ben Y. Zhao,et al.  Parallelizing Skyline Queries for Scalable Distribution , 2006, EDBT.

[9]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[10]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[11]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[12]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[13]  Evaggelia Pitoura,et al.  On Using Histograms as Routing Indexes in Peer-to-Peer Systems , 2004, DBISP2P.

[14]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[15]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[16]  Jian Pei,et al.  Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces , 2005, VLDB.