A Novel Privacy Preserving Association Rule Mining using Hadoop

Hadoop is a popular open source distributed system that can processes large scale data. Meanwhile, data mining is one of the techniques used to find pattern and gain knowledge from data sets, as well as improve massive data processing utility when combined with the Hadoop framework. However, data mining constitutes a possible threat to privacy. Although numerous studies have been conducted to address this problem, such studies were insufficient and had several drawbacks such as privacy-data utility trade-off. In this paper, we focus on privacy preserving data mining algorithm technique, particularly the association rule mining algorithm, which is a representative data mining algorithm. We propose a novel privacy preserving association rule mining algorithm in Hadoop that prevents privacy violation without the loss of data utility. Through the experimental results, the proposed technique is validated to prevent the exposure of sensitive data without degradation of data utilization. Keywords-Privacy preserving data mining; Association rule

[1]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[2]  Andrew Chi-Chih Yao,et al.  Protocols for secure computations , 1982, FOCS 1982.

[3]  Kyungho Jeon,et al.  The HybrEx Model for Confidentiality and Privacy in Cloud Computing , 2011, HotCloud.

[4]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[6]  Assaf Schuster,et al.  Data mining with differential privacy , 2010, KDD.

[7]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[8]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[9]  Philip S. Yu,et al.  Privacy-Preserving Data Mining: A Survey , 2008, Handbook of Database Security.

[10]  Xuyun Zhang,et al.  Privacy Preservation over Big Data in Cloud Systems , 2014 .

[11]  Philip S. Yu,et al.  A Survey of Randomization Methods for Privacy-Preserving Data Mining , 2008, Privacy-Preserving Data Mining.

[12]  Wenliang Du,et al.  Privacy-preserving cooperative statistical analysis , 2001, Seventeenth Annual Computer Security Applications Conference.

[13]  Dawn Xiaodong Song,et al.  Practical techniques for searches on encrypted data , 2000, Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .