Preserving privacy in association rule mining with bloom filters

Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.

[1]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[2]  Kenneth A. Ross,et al.  PERF join: an alternative to two-way semijoin and bloomjoin , 1995, CIKM '95.

[3]  Hakan Hacigümüs,et al.  Providing database as a service , 2002, Proceedings 18th International Conference on Data Engineering.

[4]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[5]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[6]  Li Fan,et al.  Summary cache: a scalable wide-area web cache sharing protocol , 2000, TNET.

[7]  Ramakrishnan Srikant,et al.  Order preserving encryption for numeric data , 2004, SIGMOD '04.

[8]  Gene Tsudik,et al.  A Framework for Efficient Storage Security in RDBMS , 2004, EDBT.

[9]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[10]  Hakan Hacigümüs,et al.  Efficient Execution of Aggregation Queries over Encrypted Relational Databases , 2004, DASFAA.

[11]  Osmar R. Zaïane,et al.  Algorithms for balancing privacy and knowledge discovery in association rule mining , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[12]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[13]  Wenliang Du,et al.  Secure multi-party computation problems and their applications: a review and open problems , 2001, NSPW '01.

[14]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[15]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[16]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[17]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[18]  Wenliang Du,et al.  Building decision tree classifier on private data , 2002 .

[19]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[20]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[21]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[22]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[23]  Benny Pinkas,et al.  Cryptographic techniques for privacy-preserving data mining , 2002, SKDD.

[24]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[25]  Osmar R. Zaïane,et al.  Protecting sensitive knowledge by data sanitization , 2003, Third IEEE International Conference on Data Mining.

[26]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[27]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[28]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[29]  Chris Clifton,et al.  When do data mining results violate privacy? , 2004, KDD.

[30]  Kian-Lee Tan,et al.  Authenticating query results in edge computing , 2004, Proceedings. 20th International Conference on Data Engineering.

[31]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.

[33]  Gene Tsudik,et al.  Authentication and integrity in outsourced databases , 2006, TOS.

[34]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[35]  A. Yao,et al.  Fair exchange with a semi-trusted third party (extended abstract) , 1997, CCS '97.

[36]  Hakan Hacigümüs,et al.  Executing SQL over encrypted data in the database-service-provider model , 2002, SIGMOD '02.

[37]  James K. Mullin,et al.  Optimal Semijoins for Distributed Database Systems , 1990, IEEE Trans. Software Eng..

[38]  Yossi Matias,et al.  Spectral bloom filters , 2003, SIGMOD '03.

[39]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2000, Journal of Cryptology.