Efficient and Secure Skyline Queries Over Vertical Data Federation

Skyline is a primitive operation in multi-objective decision applications and there is a growing demand to support such operations over a data federation, where the entire dataset is separately held by multiple data providers (a.k.a., silos). Data federations notably increase the amount of data available for data-intensive applications such as commercial recommendation and location based services. Yet they also challenge the conventional implementation of skyline queries because the raw data cannot be shared within the federation and the secure computation cross silos can be two or three orders of magnitude slower than plaintext computation. These constraints render existing solutions inefficient on data federation. In this work, we propose a novel local dominance based framework for efficient skyline queries over a vertical data federation. We decompose the skyline query into plaintext local dominance computations and secure result aggregations, which can perform as many computations in plaintext as possible without compromising security. We further propose a dedicate private set intersection based algorithm to accelerate the query processing. Extensive evaluations on both synthetic and real-world datasets show that compared with general-purpose secure multi-party computation techniques, our solutions reduce the time cost by up to 35.4× and communication cost by two orders of magnitude respectively.

[1]  Zimu Zhou,et al.  Efficient Approximate Range Aggregation Over Large-Scale Spatial Data Federation , 2023, IEEE Transactions on Knowledge and Data Engineering.

[2]  Zimu Zhou,et al.  Fed-LTD: Towards Cross-Platform Ride Hailing via Federated Learning to Dispatch , 2022, KDD.

[3]  Lei Chen,et al.  Efficient Approximate Range Aggregation over Large-scale Spatial Data Federation (Extended Abstract) , 2022, 2022 IEEE 38th International Conference on Data Engineering (ICDE).

[4]  Ke Yi,et al.  Secure Yannakakis: Join-Aggregate Queries over Private Data , 2021, SIGMOD Conference.

[5]  Yongxin Tong,et al.  An Efficient Approach for Cross-Silo Federated Learning to Rank , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[6]  Marcel Keller,et al.  MP-SPDZ: A Versatile Framework for Multi-Party Computation , 2020, IACR Cryptol. ePrint Arch..

[7]  Zimu Zhou,et al.  Federated Topic Discovery: A Semantic Consistent Approach , 2020, IEEE Intelligent Systems.

[8]  Jennie Duggan,et al.  SAQE: Practical Privacy-Preserving Approximate Query Processing for Data Federations , 2020, Proc. VLDB Endow..

[9]  Jiangtao Cui,et al.  SCALE: An Efficient Framework for Secure Dynamic Skyline Query Processing in the Cloud , 2020, DASFAA.

[10]  Qian Xu,et al.  Federated Topic Modeling , 2019, CIKM.

[11]  Florian Kerschbaum,et al.  Secure Multi-Party Functional Dependency Discovery , 2019, Proc. VLDB Endow..

[12]  Azer Bestavros,et al.  Conclave: secure multi-party computation on big data , 2019, EuroSys.

[13]  Robert H. Deng,et al.  PUSC: Privacy-Preserving User-Centric Skyline Computation Over Multiple Encrypted Domains , 2018, 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[14]  Jian Pei,et al.  Secure and Efficient Skyline Queries on Encrypted Data , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Benny Pinkas,et al.  Practical Multi-party Private Set Intersection from Symmetric-Key Techniques , 2017, CCS.

[16]  Jian Pei,et al.  Secure Skyline Queries on Cloud Platform , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[17]  Carmit Hazay,et al.  Scalable Multi-party Private Set-Intersection , 2017, Public Key Cryptography.

[18]  Gang Chen,et al.  K-dominant Skyline Queries on Incomplete Data , 2016, Inf. Sci..

[19]  Jennie Duggan,et al.  SMCQL: Secure Query Processing for Private Data Networks , 2016, Proc. VLDB Endow..

[20]  Rui Zhang,et al.  Secure outsourced skyline query processing via untrusted cloud service providers , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[21]  Kartik Nayak,et al.  ObliVM: A Programming Framework for Secure Computation , 2015, 2015 IEEE Symposium on Security and Privacy.

[22]  Gang Chen,et al.  Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data , 2014, Expert Syst. Appl..

[23]  Gang Chen,et al.  On efficient reverse skyline query processing , 2014, Expert Syst. Appl..

[24]  Changyu Dong,et al.  When private set intersection meets big data: an efficient and scalable protocol , 2013, CCS.

[25]  Jan Chomicki,et al.  Skyline queries, front and back , 2013, SGMD.

[26]  Jung Hee Cheon,et al.  Multi-Party Privacy-Preserving Set Intersection with Quasi-Linear Complexity , 2012, IACR Cryptol. ePrint Arch..

[27]  Akrivi Vlachou,et al.  A survey of skyline processing in highly distributed environments , 2012, The VLDB Journal.

[28]  Dan Suciu,et al.  Parallel Skyline Queries , 2012, Theory of Computing Systems.

[29]  Amos Beimel,et al.  Secret-Sharing Schemes: A Survey , 2011, IWCC.

[30]  Ilaria Bartolini,et al.  Efficient sort-based skyline evaluation , 2008, TODS.

[31]  Dan Bogdanov,et al.  Sharemind: A Framework for Fast Privacy-Preserving Computations , 2008, ESORICS.

[32]  Ken C. K. Lee,et al.  Approaching the Skyline in Z Order , 2007, VLDB.

[33]  Cyrus Shahabi,et al.  The spatial skyline queries , 2006, VLDB.

[34]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[35]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[36]  Wolf-Tilo Balke,et al.  Efficient Distributed Skylining for Web Information Systems , 2004, EDBT.

[37]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[38]  Jan Chomicki,et al.  Skyline with presorting , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[39]  Donald Kossmann,et al.  The Skyline operator , 2001, Proceedings 17th International Conference on Data Engineering.

[40]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[41]  Yvo Desmedt,et al.  Threshold Cryptosystems , 1989, CRYPTO.

[42]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[43]  Xiaofei Zhang,et al.  Hu-Fu: Efficient and Secure Spatial Queries over Data Federation , 2022, Proc. VLDB Endow..

[44]  Sumit Kumar Debnath,et al.  Secure and efficient multiparty private set intersection cardinality , 2021, Adv. Math. Commun..

[45]  Kim-Kwang Raymond Choo,et al.  Efficient and Privacy-Preserving Multi-Party Skyline Queries Over Encrypted Data , 2021, IEEE Transactions on Information Forensics and Security.

[46]  Xiaofang Zhou,et al.  Finding superior skyline points for multidimensional recommendation applications , 2011, World Wide Web.

[47]  Hong Shen,et al.  Privacy preserving set intersection based on bilinear groups , 2008, ACSC.

[48]  Tommy Färnqvist Number Theory Meets Cache Locality – Efficient Implementation of a Small Prime FFT for the GNU Multiple Precision Arithmetic Library , 2005 .

[49]  Taher El Gamal A public key cryptosystem and a signature scheme based on discrete logarithms , 1984, IEEE Trans. Inf. Theory.