Private Collaborative Data Cleaning via Non-Equi PSI

We introduce and investigate the privacy-preserving version of collaborative data cleaning. With collaborative data cleaning, two parties want to reconcile their data sets to filter out badly classified, misclassified data items. In the privacy-preserving (private) version of data cleaning, the additional security goal is that parties should only learn their misclassified data items, but nothing else about the other party’s data set. The problem of private data cleaning is essentially a variation of private set intersection (PSI), and one could employ recent circuit-PSI techniques to compute misclassifications with privacy. However, we design, analyze, and implement three new protocols tailored to the specifics of private data cleaning that outperform a circuit-PSI-based approach. With the first protocol, we exploit the idea that a small additional leakage (the differentially private size of the intersection of data items) allows for a reduction in complexity over circuit-PSI. The other two protocols convert the problem of finding a mismatch in data classifications into finding a match, and then follow the standard technique of using oblivious pseudorandom functions (OPRF) for computing PSI. Depending on the number of data classes, this leads to a concrete runtime improvement over circuit-PSI.

[1]  Srinivasan Raghuraman,et al.  Blazing Fast PSI from Improved OKVS and Subfield VOLE , 2022, IACR Cryptol. ePrint Arch..

[2]  Divya Gupta,et al.  Circuit-PSI With Linear Complexity via Relaxed Batch OPPRF , 2021, IACR Cryptol. ePrint Arch..

[3]  Mike Rosulek,et al.  Compact and Malicious Private Set Intersection for Small Sets , 2021, IACR Cryptol. ePrint Arch..

[4]  Kim Laine,et al.  Labeled PSI from Homomorphic Encryption with Reduced Computation and Communication , 2021, IACR Cryptol. ePrint Arch..

[5]  Moti Yung,et al.  On Deploying Secure Computing: Private Intersection-Sum-with-Cardinality , 2020, 2020 IEEE European Symposium on Security and Privacy (EuroS&P).

[6]  AnHai Doan,et al.  CoClean: Collaborative Data Cleaning , 2020, SIGMOD Conference.

[7]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[8]  Benny Pinkas,et al.  PSI from PaXoS: Fast, Malicious Private Set Intersection , 2020, IACR Cryptol. ePrint Arch..

[9]  Yuval Ishai,et al.  Efficient Two-Round OT Extension and Silent Non-Interactive Secure Computation , 2019, IACR Cryptol. ePrint Arch..

[10]  Benny Pinkas,et al.  Efficient Circuit-based PSI with Linear Communication , 2019, IACR Cryptol. ePrint Arch..

[11]  Hao Chen,et al.  Labeled PSI from Fully Homomorphic Encryption with Malicious Security , 2018, IACR Cryptol. ePrint Arch..

[12]  Benny Pinkas,et al.  Efficient Circuit-based PSI via Cuckoo Hashing , 2018, IACR Cryptol. ePrint Arch..

[13]  Benny Pinkas,et al.  Scalable Private Set Intersection Based on OT Extension , 2018, IACR Cryptol. ePrint Arch..

[14]  Ashwin Machanavajjhala,et al.  APEx: Accuracy-Aware Differentially Private Data Exploration , 2017, SIGMOD Conference.

[15]  Benny Pinkas,et al.  Practical Multi-party Private Set Intersection from Symmetric-Key Techniques , 2017, CCS.

[16]  Clayton Ripps,et al.  Intersection , 2017, Definitions.

[17]  Cheryl J. Flynn,et al.  Composing Differential Privacy and Secure Computation: A Case Study on Scaling Private Record Linkage , 2017, CCS.

[18]  Vladimir Kolesnikov,et al.  Efficient Batched Oblivious PRF with Applications to Private Set Intersection , 2016, CCS.

[19]  Tim Kraska,et al.  PrivateClean: Data Cleaning and Differential Privacy , 2016, SIGMOD Conference.

[20]  Nickolai Zeldovich,et al.  Vuvuzela: scalable private messaging resistant to traffic analysis , 2015, SOSP.

[21]  Benny Pinkas,et al.  Phasing: Private Set Intersection Using Permutation-based Hashing , 2015, USENIX Security Symposium.

[22]  Decision of the European Court of Justice 11 July 2013 – Ca C-52111 “Amazon” , 2013, IIC - International Review of Intellectual Property and Competition Law.

[23]  Emiliano De Cristofaro,et al.  Fast and Private Computation of Cardinality of Set Intersection and Union , 2012, CANS.

[24]  Omer Reingold,et al.  Computational Differential Privacy , 2009, CRYPTO.

[25]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[27]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[28]  Tad Hogg,et al.  Enhancing privacy and trust in electronic communities , 1999, EC '99.

[29]  Leonid A. Levin,et al.  A Pseudorandom Generator from any One-way Function , 1999, SIAM J. Comput..

[30]  Catherine A. Meadows,et al.  A More Efficient Cryptographic Matchmaking Protocol for Use in the Absence of a Continuously Available Third Party , 1986, 1986 IEEE Symposium on Security and Privacy.

[31]  Phillipp Schoppmann,et al.  VOLE-PSI: Fast OPRF and Circuit-PSI from Vector-OLE , 2021, IACR Cryptol. ePrint Arch..

[32]  Benny Pinkas,et al.  Oblivious Key-Value Stores and Amplification for Private Set Intersection , 2021, IACR Cryptol. ePrint Arch..

[33]  Hans-Holger Herrnfeld,et al.  Article 47 Principles relating to processing of personal data , 2021 .

[34]  Alptekin Küpçü,et al.  Linear Complexity Private Set Intersection for Secure Two-Party Protocols , 2020, IACR Cryptol. ePrint Arch..

[35]  Venkatesh Ganti Data Cleaning , 2018, Encyclopedia of Database Systems.

[36]  Moti Yung,et al.  Private Intersection-Sum Protocol with Applications to Attributing Aggregate Ad Conversions , 2017, IACR Cryptol. ePrint Arch..

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Harald Kosch,et al.  Image Database , 2009, Encyclopedia of Database Systems.