A high collusion-resistant approach to distributed privacy-preserving data mining

Data mining across different companies, organizations, online shops, or the likes is necessary so as to discover valuable shared patterns, associations, trends, or dependencies in their shared data. Privacy, however, is a concern. In many situations it is required that data mining should be conducted without any privacy being violated. In response to this requirement, this paper proposes an effective distributed privacy-preserving data mining approach called CRDM (Collusion-Resistant Data Mining). CRDM is characterized by its ability to resist the collusion. Let the number of sites participating in data mining be M. Unless the number of colluding sites is not less than M - 1, privacy cannot be violated. Results of both analytical and experimental performance study demonstrated the effectiveness of CRDM.

[1]  Michael K. Reiter,et al.  Crowds: anonymity for Web transactions , 1998, TSEC.

[2]  Charu C. Aggarwal,et al.  On the design and quantification of privacy preserving data mining algorithms , 2001, PODS.

[3]  Paul Syverson,et al.  Onion Routing for Anonymous and Private Internet Connections , 1999 .

[4]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[5]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[6]  Jayant R. Haritsa,et al.  On Addressing Efficiency Concerns in Privacy-Preserving Mining , 2003, DASFAA.

[7]  Toyoo Takata,et al.  An Effective Distributed Privacy-Preserving Data Mining Algorithm , 2004, IDEAL.

[8]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  David Chaum,et al.  Untraceable electronic mail, return addresses, and digital pseudonyms , 1981, CACM.

[11]  Vladimir Estivill-Castro,et al.  Preface: proceedings of the ICDM 2002 workshop on privacy, security, and data mining , 2002 .

[12]  Yehuda Lindell,et al.  Privacy Preserving Data Mining , 2002, Journal of Cryptology.

[13]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Murat Kantarcioglu,et al.  An architecture for privacy-preserving mining of client information , 2002 .

[16]  Srinivasan Parthasarathy,et al.  Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases , 2003, SBBD.

[17]  Stanley Robson de Medeiros Oliveira,et al.  Privacy preserving frequent itemset mining , 2002 .

[18]  Birgit Pfitzmann,et al.  ISDN-MIXes: Untraceable Communication with Small Bandwidth Overhead , 1991, Kommunikation in Verteilten Systemen.

[19]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[20]  Rakesh Agrawal,et al.  Privacy-preserving data mining , 2000, SIGMOD 2000.

[21]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[22]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Paul F. Syverson,et al.  Onion routing , 1999, CACM.

[25]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[26]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..