PPDAM: Privacy-Preserving Distributed Association-Rule-Mining Algorithm

Data mining is a process that analyzes voluminous digital data in order to discover hidden but useful patterns from digital data. However, the discovering of such hidden patterns has statistical meaning and may often disclose some sensitive information. As a result, privacy becomes one of the prime concerns in the data-mining research community. Since distributed association mining discovers association rules by combining local models from various distributed sites, breaching data privacy happens more often than it does in centralized environments. In this work, we present a methodology that generates association rules without revealing confidential inputs such as statistical properties of individual sites, and yet retains a high level of accuracy in the resultant rules. One of the important outcomes of the proposed technique is that it reduces the overall communication costs. Performance evaluation of our proposed method shows that it reduces the communication cost significantly when we compare it with other well-known, distributed association-rule-mining algorithms. Nevertheless, the global rule model generated by the proposed method is based on the exact global support of each item set and hence diminishes inconsistency, which indeed occurs when global models are generated from partial support count of an item set.

[1]  Kate Smith-Miles,et al.  Reducing Communication Cost in a Privacy Preserving Distributed Association Rule Mining , 2004, DASFAA.

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[4]  V. Sugumaran The Inaugural Issue of the International Journal of Intelligent Information Technologies , 2005 .

[5]  Gu Si-yang,et al.  Privacy preserving association rule mining in vertically partitioned data , 2006 .

[6]  Vassilios S. Verykios,et al.  Disclosure limitation of sensitive rules , 1999, Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX'99) (Cat. No.PR00453).

[7]  Rüdiger Wirth,et al.  When Distribution is Part of the Semantics: A New Problem Class for Distributed Knowledge Discovery , 2001 .

[8]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[9]  David Taniar,et al.  ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.

[10]  Alexandre V. Evfimievski,et al.  Privacy preserving mining of association rules , 2002, Inf. Syst..

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[13]  Moni Naor,et al.  Communication preserving protocols for secure function evaluation , 2001, STOC '01.

[14]  Jayant R. Haritsa,et al.  Maintaining Data Privacy in Association Rule Mining , 2002, VLDB.

[15]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[16]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[17]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[18]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.