Bloom Filter Bootstrap: Privacy-Preserving Estimation of the Size of an Intersection

This paper proposes a new privacy-preserving scheme for estimating the size of the intersection of two given secret subsets. Given the inner product of two Bloom filters (BFs) of the given sets, the proposed scheme applies Bayesian estimation under assumption of beta distribution for an a priori probability of the size to be estimated. The BF retains the communication complexity and the Bayesian estimation improves the estimation accuracy. An possible application of the proposed protocol is an epidemiological datasets regarding two attributes, Helicobactor pylori infection and stomach cancer. Assuming information related to Helicobactor Pylori infection and stomach cancer are separately collected, the protocol demonstrates that a χ2-test can be performed without disclosing the contents of the two confidential databases.

[1]  Chris Clifton,et al.  Tools for privacy preserving distributed data mining , 2002, SKDD.

[2]  Mario Plebani,et al.  Pathogenesis of Helicobacter pylori Infection , 2010, Helicobacter.

[3]  Jun Sakuma,et al.  Privacy-Preserving Evaluation of Generalization Error and Its Application to Model and Attribute Selection , 2009, ACML.

[4]  Jaideep Vaidya,et al.  Secure Construction of Contingency Tables from Distributed Data , 2008, DBSec.

[5]  P. Ravikumar and W. W. Cohen and S. E. Fienberg,et al.  A Secure Protocol for Computing String Distance Metrics , 2004 .

[6]  Alexandre V. Evfimievski,et al.  Information sharing across private databases , 2003, SIGMOD '03.

[7]  Emiliano De Cristofaro,et al.  Practical Private Set Intersection Protocols with Linear Complexity , 2010, Financial Cryptography.

[8]  Murat Kantarcioglu,et al.  An Efficient Approximate Protocol for Privacy-Preserving Association Rule Mining , 2009, PAKDD.

[9]  Jan Camenisch,et al.  Private Intersection of Certified Sets , 2009, Financial Cryptography.

[10]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[11]  Bart Goethals,et al.  On Private Scalar Product Computation for Privacy-Preserving Data Mining , 2004, ICISC.

[12]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[13]  A. Helicobacter,et al.  Gastric cancer and Helicobacter pylori: a combined analysis of 12 case control studies nested within prospective cohorts , 2001, Gut.

[14]  Marcello Pagano,et al.  Principles of Biostatistics , 1992 .

[15]  J. Atherton,et al.  The pathogenesis of Helicobacter pylori-induced gastro-duodenal diseases. , 2006, Annual review of pathology.

[16]  Benny Pinkas,et al.  Efficient Private Matching and Set Intersection , 2004, EUROCRYPT.

[17]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[18]  E. Kuipers,et al.  Pathogenesis of Helicobacter pylori Infection , 2006, Clinical Microbiology Reviews.