Secure Computation of Skyline Query in MapReduce

To select representative objects from a large scale database is an important step to understand the database. A skyline query, which retrieves a set of non-dominated objects, is one of popular methods for selecting representative objects. In this paper, we have considered a distributed algorithm for computing a skyline query in order to handle “big data”. In conventional distributed algorithms for computing a skyline query, the values of each object of a local database have to be disclosed to another. Recently, we have to be aware of privacy in a database, in which such disclosures of privacy information in conventional distributed algorithms are not allowed. In this work, we propose a novel approach to compute the skyline in a multi-parties computing environment without disclosing individual values of objects to another party. Our method is designed to work in MapReduce framework − in Hadoop framework. Our experimental results confirm the effectiveness and scalability of the proposed secure skyline computation.

[1]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[2]  Kyuseok Shim,et al.  Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce , 2013, Proc. VLDB Endow..

[3]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[4]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[5]  Mohammad Anisuzzaman Siddique,et al.  Distributed Skyline Computation of Vertically Splitted Databases by Using MapReduce , 2014, DASFAA Workshops.

[6]  Mohammad Anisuzzaman Siddique,et al.  k-Dominant Skyline Query Computation in MapReduce Environment , 2015, IEICE Trans. Inf. Syst..

[7]  Silvio Micali,et al.  How to play ANY mental game , 1987, STOC.

[8]  Shuigeng Zhou,et al.  Adapting Skyline Computation to the MapReduce Framework: Algorithms and Experiments , 2011, DASFAA Workshops.

[9]  Jignesh M. Patel,et al.  A comparison of join algorithms for log processing in MaPreduce , 2010, SIGMOD Conference.

[10]  Christos Doulkeridis,et al.  AGiDS: A Grid-Based Strategy for Distributed Skyline Query Processing , 2009, Globe.

[11]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Hua Lu,et al.  Efficient Skyline Computation in MapReduce , 2014, EDBT.

[13]  Donald Kossmann,et al.  Shooting Stars in the Sky: An Online Algorithm for Skyline Queries , 2002, VLDB.

[14]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[15]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[16]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.

[17]  Yasuhiko Morimoto,et al.  Privacy Aware Parallel Computation of Skyline Sets Queries from Distributed Databases , 2011, 2011 Second International Conference on Networking and Computing.

[18]  Mohammad Anisuzzaman Siddique,et al.  An Efficient Processing of k-Dominant Skyline Query in MapReduce , 2014, Data4U '14.

[19]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[20]  Anthony K. H. Tung,et al.  Efficient Skyline Query Processing on Peer-to-Peer Networks , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Anthony K. H. Tung,et al.  MAP-JOIN-REDUCE: Toward Scalable and Efficient Data Analysis on Large Clusters , 2011, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[23]  Yufei Tao,et al.  Minimal MapReduce algorithms , 2013, SIGMOD '13.