Association rule mining in peer-to-peer systems

We extend the problem of association rule mining-a key data mining problem-to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include grid computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses several difficulties, such as the impracticality of global communications and global synchronization, dynamic topology changes of the network, on-the-fly data updates, the need to share resources with other applications, and the frequent failure and recovery of resources. We present an algorithm by which every node in the system can reach the exact solution, as if it were given the combined database. The algorithm is entirely asynchronous, imposes very little communication overhead, transparently tolerates network topology changes and node failures, and quickly adjusts to changes in the data as they occur. Simulation of up to 10 000 nodes show that the algorithm is local: all rules, except for those whose confidence is about equal to the confidence threshold, are discovered using information gathered from a very small vicinity, whose size is independent of the size of the system.

[1]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[3]  Masaru Kitsuregawa,et al.  Parallel FP-Growth on PC Cluster , 2003, PAKDD.

[4]  M. Narasimha Murty,et al.  Scalable, Distributed and Dynamic Mining of Association Rules , 2000, HiPC.

[5]  Shay Kutten,et al.  Fault-Local Distributed Mending , 1999, J. Algorithms.

[6]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[7]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[8]  Boaz Patt-Shamir,et al.  Stabilizing Time-Adaptive Protocols , 1999, Theor. Comput. Sci..

[9]  David Wai-Lok Cheung,et al.  Efficient Algorithms for Incremental Update of Frequent Sequences , 2002, PAKDD.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[12]  Ran Wolff,et al.  Communication-Efficient Distributed Mining of Association Rules , 2001, SIGMOD '01.

[13]  Richard D. Schlichting,et al.  Fail-stop processors: an approach to designing fault-tolerant computing systems , 1983, TOCS.

[14]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[15]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[16]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[17]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[18]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[19]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[20]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[21]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[22]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[23]  Sharma Chakravarthy,et al.  Incremental Mining of Constrained Associations , 2000, HiPC.

[24]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.