Skyline-join query processing in distributed databases

The skyline-join operator, as an important variant of skylines, plays an important role in multi-criteria decision making problems. However, as the data scale increases, previous methods of skyline-join queries cannot be applied to new applications. Therefore, in this paper, it is the first attempt to propose a scalable method to process skyline-join queries in distributed databases. First, a tailored distributed framework is presented to facilitate the computation of skyline-join queries. Second, the distributed skyline-join query algorithm (DSJQ) is designed to process skyline-join queries. DSJQ contains two phases. In the first phase, two filtering strategies are used to filter out unpromising tuples from the original tables. The remaining tuples are transmitted to the corresponding data nodes according a partition function, which can guarantee that the tuples with the same join value are transferred to the same node. In the second phase, we design a scheduling plan based on rotations to calculate the final skyline-join result. The scheduling plan can ensure that calculations are equally assigned to all the data nodes, and the calculations on each data node can be processed in parallel without creating a bottleneck node. Finally, the effectiveness of DSJQ is evaluated through a series of experiments.

[1]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[2]  Christos Doulkeridis,et al.  SKYPEER: Efficient Subspace Skyline Computation over Distributed Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[3]  Mohamed F. Mokbel,et al.  Skyline query processing for uncertain data , 2010, CIKM.

[4]  Muhammad Aamir Cheema,et al.  Stochastic skyline operator , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Hua Lu,et al.  Parallel Distributed Processing of Constrained Skyline Queries by Filtering , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, ICDT '12.

[7]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[8]  Christos Doulkeridis,et al.  Skyline query processing over joins , 2011, SIGMOD '11.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Anthony K. H. Tung,et al.  Skyline-join in distributed databases , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[13]  Xiang Lian,et al.  Efficient processing of probabilistic group subspace skyline queries in uncertain databases , 2013, Inf. Syst..

[14]  Jiawei Han,et al.  The Multi-Relational Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Hua Lu,et al.  iSky: Efficient and Progressive Skyline Computing in a Structured P2P Network , 2008, 2008 The 28th International Conference on Distributed Computing Systems.

[16]  K. Selçuk Candan,et al.  Skyline-sensitive joins with LR-pruning , 2012, EDBT '12.

[17]  Jing Yang,et al.  Efficient parallel skyline processing using hyperplane projections , 2011, SIGMOD '11.

[18]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[19]  Ilaria Bartolini,et al.  SaLSa: computing the skyline without scanning the whole sky , 2006, CIKM '06.

[20]  Qing Zhu,et al.  Efficient query processing framework for big data warehouse: an almost join-free approach , 2014, Frontiers of Computer Science.

[21]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[22]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[23]  Jignesh M. Patel,et al.  Evaluating skylines in the presence of equijoins , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[24]  Xiang Lian,et al.  Dynamic skyline queries in metric spaces , 2008, EDBT '08.

[25]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[26]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[27]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.