When private set intersection meets big data: an efficient and scalable protocol

Large scale data processing brings new challenges to the design of privacy-preserving protocols: how to meet the increasing requirements of speed and throughput of modern applications, and how to scale up smoothly when data being protected is big. Efficiency and scalability become critical criteria for privacy preserving protocols in the age of Big Data. In this paper, we present a new Private Set Intersection (PSI) protocol that is extremely efficient and highly scalable compared with existing protocols. The protocol is based on a novel approach that we call oblivious Bloom intersection. It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be parallelized easily. The protocol has two versions: a basic protocol and an enhanced protocol, the security of the two variants is analyzed and proved in the semi-honest model and the malicious model respectively. A prototype of the basic protocol has been built. We report the result of performance evaluation and compare it against the two previously fastest PSI protocols. Our protocol is orders of magnitude faster than these two protocols. To compute the intersection of two million-element sets, our protocol needs only 41 seconds (80-bit security) and 339 seconds (256-bit security) on moderate hardware in parallel mode.

[1]  Yehuda Lindell,et al.  Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries , 2008, Journal of Cryptology.

[2]  Emiliano De Cristofaro,et al.  Practical Private Set Intersection Protocols with Linear Complexity , 2010, Financial Cryptography.

[3]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[4]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[5]  Quynh H. Dang,et al.  Recommendation for Applications Using Approved Hash Algorithms , 2009 .

[6]  Dan Boneh,et al.  OpenConflict: Preventing Real Time Map Hacks in Online Games , 2011, 2011 IEEE Symposium on Security and Privacy.

[7]  Emiliano De Cristofaro,et al.  Linear-Complexity Private Set Intersection Protocols Secure in Malicious Model , 2010, ASIACRYPT.

[8]  Emiliano De Cristofaro,et al.  (If) Size Matters: Size-Hiding Private Set Intersection , 2011, IACR Cryptol. ePrint Arch..

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Prateek Mittal,et al.  BotGrep: Finding P2P Bots with Structured Graph Analysis , 2010, USENIX Security Symposium.

[11]  Bruce Schneier,et al.  Applied cryptography (2nd ed.): protocols, algorithms, and source code in C , 1995 .

[12]  Moni Naor,et al.  Efficient oblivious transfer protocols , 2001, SODA '01.

[13]  Xiaomin Liu,et al.  Efficient Oblivious Pseudorandom Function with Applications to Adaptive OT and Secure Computation of Set Intersection , 2009, TCC.

[14]  Michael O. Rabin,et al.  How To Exchange Secrets with Oblivious Transfer , 2005, IACR Cryptol. ePrint Arch..

[15]  Ran Canetti,et al.  Security and Composition of Multiparty Cryptographic Protocols , 2000, Journal of Cryptology.

[16]  NejdlWolfgang,et al.  Cardinality estimation and dynamic length adaptation for Bloom filters , 2010 .

[17]  Xiaomin Liu,et al.  Fast Secure Computation of Set Intersection , 2010, SCN.

[18]  Donald Beaver,et al.  Correlated pseudorandomness and the complexity of private computations , 1996, STOC '96.

[19]  Michiel H. M. Smid,et al.  On the false-positive rate of Bloom filters , 2008, Inf. Process. Lett..

[20]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[21]  Yuval Ishai,et al.  Extending Oblivious Transfers Efficiently , 2003, CRYPTO.

[22]  S. Rajsbaum Foundations of Cryptography , 2014 .

[23]  Marc Fischlin,et al.  Random Oracles with(out) Programmability , 2010, ASIACRYPT.

[24]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[25]  Philip S. Yu,et al.  Privacy-Preserving Data Mining - Models and Algorithms , 2008, Advances in Database Systems.

[26]  Emiliano De Cristofaro,et al.  Experimenting with Fast Private Set Intersection , 2012, TRUST.

[27]  Bruce Schneier,et al.  Applied cryptography : protocols, algorithms, and source codein C , 1996 .

[28]  Wolfgang Nejdl,et al.  Cardinality estimation and dynamic length adaptation for Bloom filters , 2010, Distributed and Parallel Databases.

[29]  Oded Goldreich,et al.  The Foundations of Cryptography - Volume 2: Basic Applications , 2001 .

[30]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[31]  Moti Yung,et al.  Efficient robust private set intersection , 2009, Int. J. Appl. Cryptogr..

[32]  Carmit Hazay,et al.  Efficient Set Operations in the Presence of Malicious Adversaries , 2010, Journal of Cryptology.

[33]  Mihir Bellare,et al.  Random oracles are practical: a paradigm for designing efficient protocols , 1993, CCS '93.

[34]  Panagiotis Papadimitratos,et al.  Privacy-Preserving Relationship Path Discovery in Social Networks , 2009, CANS.

[35]  Florian Kerschbaum,et al.  Outsourced private set intersection using homomorphic encryption , 2012, ASIACCS '12.

[36]  M. Burkhart,et al.  Fast Private Set Operations with SEPIA , 2012 .

[37]  Jan Camenisch,et al.  Private Intersection of Certified Sets , 2009, Financial Cryptography.

[38]  Quynh H. Dang SP 800-107. Recommendation for Applications Using Approved Hash Algorithms , 2009 .

[39]  Oded Goldreich,et al.  A randomized protocol for signing contracts , 1985, CACM.

[40]  Jonathan Katz,et al.  Private Set Intersection: Are Garbled Circuits Better than Custom Protocols? , 2012, NDSS.

[41]  Dan Boneh,et al.  Location Privacy via Private Proximity Testing , 2011, NDSS.

[42]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.