Probabilistic Skyline Computation on Vertically Distributed Uncertain Data

The skyline query is important in database community. Recently, owing to the inherent uncertainty of some applications, skyline query on uncertain data has been widelystudied using probabilistic model, e.g. p-skyline. In the scenario where uncertain data is vertically distributed among multiple servers, the main purpose of p-skyline computation is to minimize the retrieved records from servers to the local client due to the dominance factor of expensive network communication. In this paper, we present three communication-efficient p-skyline algorithms ASR, IASR and FSLR on vertically distributed uncertain data. ASR alternates sorted and random accesses to retrieve the records at servers and performs retrieving-boundingchecking iteration until all the objects can be determined whether they are in the p-skyline result or not. The communication of the instances not retrieved can be saved. IASR is an improved version of ASR. By examining the net gain of retrieving-boundingchecking iteration, IASR early terminates the iteration to further reduce the cost of communication. Compared to ASR and IASR, FSLR performs random accesses only on demand. FSLR first conducts sorted accesses to get loose upper bounds of skyline probabilities of the instances. Then, FSLR uses random accesses to complement a part of retrieved instances to get tighter upper and lower bounds of skyline probabilities until the p-skyline result is computed. Our experimental results demonstrate that our algorithms ASR, IASR and FSLR significantly outperform the intuitive method for p-skyline computation on vertically distributed uncertain data.

[1]  Hai Jin,et al.  Efficient and Progressive Algorithms for Distributed Skyline Queries over Uncertain Data , 2012, IEEE Trans. Knowl. Data Eng..

[2]  Jarek Gryz,et al.  Maximal Vector Computation in Large Data Sets , 2005, VLDB.

[3]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Beng Chin Ooi,et al.  Efficient Progressive Skyline Computation , 2001, VLDB.

[5]  Jeffrey Xu Yu,et al.  Probabilistic Skyline Operator over Sliding Windows , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[6]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[8]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[9]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[10]  Xu Zhou,et al.  Adaptive Processing for Distributed Skyline Queries over Uncertain Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bin Jiang,et al.  Ranking uncertain sky: The probabilistic top-k skyline operator , 2011, Inf. Syst..

[12]  Christian Böhm,et al.  Probabilistic skyline queries , 2009, CIKM.

[13]  Yin Yang,et al.  Skyline Processing on Distributed Vertical Decompositions , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bin Jiang,et al.  Probabilistic Skylines on Uncertain Data , 2007, VLDB.

[15]  Xiang Lian,et al.  Monochromatic and bichromatic reverse skyline search over uncertain databases , 2008, SIGMOD Conference.

[16]  Jianzhong Li,et al.  Probabilistic Skyline on Incomplete Data , 2017, CIKM.

[17]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[18]  Mikhail J. Atallah,et al.  Computing all skyline probabilities for uncertain data , 2009, PODS.

[19]  Nikos Mamoulis,et al.  Scalable skyline computation using object-based space partitioning , 2009, SIGMOD Conference.