Privacy-preserving association rule mining in large-scale distributed systems

Data privacy is a major concern that threatens the widespread deployment of data Grids in domains such as health-care and finance. We propose a unique approach for obtaining knowledge, by way of a data mining model, from a data Grid, while ensuring that the data is cryptographically safe. This is made possible by an innovative, yet natural generalization for the accepted trusted third party model and a new privacy-preserving data mining algorithm that is suitable for Grid-scale systems. The algorithm is asynchronous, involves no global communication patterns, and dynamically adjusts to changes in the data or to the failure and recovery of resources. To the best of our knowledge, this is the first privacy-preserving mining algorithm to possess these features. Simulations of thousands of resources prove that our algorithm quickly converges to the correct result while using reasonable communication. The simulations also prove that the effect of the privacy parameter on both the convergence time and the number of messages, is logarithmic.