EsPRESSo: Efficient Privacy-Preserving Evaluation of Sample Set Similarity

Electronic information is increasingly often shared among entities without complete mutual trust. To address related security and privacy issues, a few cryptographic techniques have emerged that support privacy-preserving information sharing and retrieval. One interesting open problem in this context involves two parties that need to assess the similarity of their datasets, but are reluctant to disclose their actual content. This paper presents an efficient and provably-secure construction supporting the privacy-preserving evaluation of sample set similarity, where similarity is measured as the Jaccard index. We present two protocols: the first securely computes the Jaccard similarity of two sets, and the second approximates it, using MinHash techniques, with lower complexities. We show that our novel protocols are attractive in many compelling applications, including document/multimedia similarity, biometric authentication and genetic tests. In the process, we demonstrate that our constructions are appreciably more efficient than prior work.

[1]  Andrew Chi-Chih Yao,et al.  How to generate and exchange secrets , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[2]  Emiliano De Cristofaro,et al.  (If) Size Matters: Size-Hiding Private Set Intersection , 2011, IACR Cryptol. ePrint Arch..

[3]  Stan Sclaroff,et al.  Deformable prototypes for encoding shape categories in image databases , 1995, Pattern Recognit..

[4]  Emiliano De Cristofaro,et al.  Fast and Private Computation of Set Intersection Cardinality , 2011, IACR Cryptol. ePrint Arch..

[5]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[6]  Stefan Katzenbeisser,et al.  Privacy-Preserving Face Recognition , 2009, Privacy Enhancing Technologies.

[7]  Vincenzo Piuri,et al.  Privacy-preserving fingercode authentication , 2010, MM&Sec '10.

[8]  Rafail Ostrovsky,et al.  Secure two-party k-means clustering , 2007, CCS '07.

[9]  Alan M. Frieze,et al.  Min-wise independent permutations (extended abstract) , 1998, STOC '98.

[10]  Oded Goldreich,et al.  Foundations of Cryptography: Volume 2, Basic Applications , 2004 .

[11]  Chanathip Namprempre,et al.  The One-More-RSA-Inversion Problems and the Security of Chaum's Blind Signature Scheme , 2003, Journal of Cryptology.

[12]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[14]  Pascal Paillier,et al.  Public-Key Cryptosystems Based on Composite Degree Residuosity Classes , 1999, EUROCRYPT.

[15]  B. S. Manjunath,et al.  Texture features and learning similarity , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[17]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[18]  Marina Blanton,et al.  Secret Handshakes with Dynamic and Fuzzy Matching , 2007, NDSS.

[19]  Emiliano De Cristofaro,et al.  EsPRESSO: Efficient privacy-preserving evaluation of sample set similarity , 2014, J. Comput. Secur..

[20]  Yongguang Zhang,et al.  Security in Mobile Ad-Hoc Networks , 2005 .

[21]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[22]  Luca Becchetti,et al.  A lightweight privacy preserving SMS-based recommendation system for mobile users , 2010, RecSys.

[23]  P. Ravikumar and W. W. Cohen and S. E. Fienberg,et al.  A Secure Protocol for Computing String Distance Metrics , 2004 .

[24]  K. Davies The $1,000 Genome: The Revolution in DNA Sequencing and the New Era of Personalized Medicine , 2010 .

[25]  Chris Clifton,et al.  Efficient privacy-preserving similar document detection , 2010, The VLDB Journal.

[26]  Stefan Katzenbeisser,et al.  Protection and Retrieval of Encrypted Multimedia Content: When Cryptography Meets Signal Processing , 2007, EURASIP J. Inf. Secur..

[27]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[28]  Balachander Krishnamurthy,et al.  Collaborating against common enemies , 2005, IMC '05.

[29]  Meena Dilip Singh,et al.  A privacy preserving Jaccard similarity function for mining encrypted data , 2009, TENCON 2009 - 2009 IEEE Region 10 Conference.

[30]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Nathan Chenette,et al.  Order-Preserving Symmetric Encryption , 2009, IACR Cryptol. ePrint Arch..

[32]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[33]  Luca Becchetti,et al.  A lightweight privacy preserving SMS-based recommendation system for mobile users , 2010, RecSys '10.

[34]  Chris Clifton,et al.  Secure set intersection cardinality with application to association rule mining , 2005, J. Comput. Secur..

[35]  Benny Pinkas,et al.  SCiFI - A System for Secure Face Identification , 2010, 2010 IEEE Symposium on Security and Privacy.

[36]  Chris Clifton,et al.  Similar Document Detection with Limited Information Disclosure , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[37]  Mikhail J. Atallah,et al.  Private collaborative forecasting and benchmarking , 2004, WPES '04.

[38]  Udi Manber,et al.  Finding Similar Files in a Large File System , 1994, USENIX Winter.

[39]  James H Fowler,et al.  Correlated genotypes in friendship networks , 2011, Proceedings of the National Academy of Sciences.

[40]  Emiliano De Cristofaro,et al.  Fast and Private Computation of Cardinality of Set Intersection and Union , 2012, CANS.

[41]  Min Wu,et al.  Secure image retrieval through feature protection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Sharath Pankanti,et al.  Filterbank-based fingerprint matching , 2000, IEEE Trans. Image Process..

[43]  Wei Jiang,et al.  N-Gram Based Secure Similar Document Detection , 2011, DBSec.

[44]  Wenjing Lou,et al.  Anonymous communications in mobile ad hoc networks , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[45]  Helger Lipmaa,et al.  Verifiable Homomorphic Oblivious Transfer and Private Equality Test , 2003, ASIACRYPT.

[46]  Florian Kerschbaum,et al.  Outsourced private set intersection using homomorphic encryption , 2012, ASIACCS '12.

[47]  Min Wu,et al.  Enabling search over encrypted multimedia databases , 2009, Electronic Imaging.

[48]  Michael J. Sadowsky,et al.  Use of Repetitive DNA Sequences and the PCR To DifferentiateEscherichia coli Isolates from Human and Animal Sources , 2000, Applied and Environmental Microbiology.

[49]  Dawn Xiaodong Song,et al.  Privacy-Preserving Set Operations , 2005, CRYPTO.

[50]  Ahmad-Reza Sadeghi,et al.  Efficient Privacy-Preserving Face Recognition , 2009, ICISC.

[51]  Alan M. Frieze,et al.  Min-Wise Independent Permutations , 2000, J. Comput. Syst. Sci..

[52]  Emiliano De Cristofaro,et al.  Countering GATTACA: efficient and secure testing of fully-sequenced human genomes , 2011, CCS '11.

[53]  Murat Kantarcioglu,et al.  An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining , 2009, PAKDD.

[54]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[55]  Marina Blanton,et al.  Secure and Efficient Protocols for Iris and Fingerprint Identification , 2011, ESORICS.

[56]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[57]  Shih-Fu Chang,et al.  Tools and techniques for color image retrieval , 1996, Electronic Imaging.

[58]  Anil K. Jain,et al.  Image retrieval using color and shape , 1996, Pattern Recognit..

[59]  Emiliano De Cristofaro,et al.  Genodroid: are privacy-preserving genomic tests ready for prime time? , 2012, WPES '12.

[60]  Ming Li,et al.  FindU: Privacy-preserving personal profile matching in mobile social networks , 2011, 2011 Proceedings IEEE INFOCOM.