Dataless Data Mining: Association Rules-Based Distributed Privacy-Preserving Data Mining

Today, the desire to mine data from varied sources to discover behaviors and patterns of entities such as customers, diseases, and environmental conditions is on the rise. At the same time, the resistance to share data is also on the raise due to the increase in governmental regulations and individuals desire to preserve privacy. In this paper, we employ association rule mining to preserve individual data privacy without overly compromising on the accuracy of the global data mining task. Here, we describe the proposed methodology and show that the proposed scheme is privacy preserving. The methodology is tested using three commonly available data sets. The results validate our claims regarding the accuracy of synthetic data in its ability to represent original data without compromising privacy.

[1]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[2]  Li Yan,et al.  Privacy-preserving distributed association rule mining based on the secret sharing technique , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[3]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[4]  Jan Camenisch,et al.  Information privacy?! , 2012, Comput. Networks.

[5]  Chris Clifton,et al.  Privacy-preserving distributed mining of association rules on horizontally partitioned data , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ravi Mukkamala,et al.  Data mining without data: a novel approach to privacy-preserving collaborative distributed data mining , 2011, WPES.

[7]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[8]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[9]  Devarakonda Aruna Kumari PRIVACY PRESERVING DATA MINING , 2014 .

[10]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[12]  Adi Shamir,et al.  How to share a secret , 1979, CACM.

[13]  Chris Clifton,et al.  Using unknowns to prevent discovery of association rules , 2001, SGMD.

[14]  Ikou Kaku,et al.  Data Mining: Concepts, Methods and Applications in Management and Engineering Design , 2011 .

[15]  Justin Zhijun Zhan,et al.  Privacy-preserving collaborative data mining , 2007, IEEE Computational Intelligence Magazine.

[16]  S. Reiss,et al.  Data-swapping: A technique for disclosure control , 1982 .

[17]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[18]  Alexandre V. Evfimievski,et al.  Randomization in privacy preserving data mining , 2002, SKDD.

[19]  Murat Kantarcioglu,et al.  An architecture for privacy-preserving mining of client information , 2002 .

[20]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.