Parallel Data Mining for Association Rules on Shared-memory Systems

Abstract. In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up for the parallel algorithm.A lot of data-mining tasks (e.g. association rules, sequential patterns) use complex pointer-based data structures (e.g. hash trees) that typically suffer from suboptimal data locality. In the multiprocessor case shared access to these data structures may also result in false sharing. For these tasks it is commonly observed that the recursive data structure is built once and accessed multiple times during each iteration. Furthermore, the access patterns after the build phase are highly ordered. In such cases locality and false sharing sensitive memory placement of these structures can enhance performance significantly. We evaluate a set of placement policies for parallel association discovery, and show that simple placement schemes can improve execution time by more than a factor of two. More complex schemes yield additional gains.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  William A. Wulf,et al.  An efficient algorithm for heap storage allocation , 1988, SIGP.

[3]  Josep Torrellas,et al.  Share Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates , 1990, ICPP.

[4]  Susan J. Eggers,et al.  Eliminating False Sharing , 1991, ICPP.

[5]  Ricardo Bianchini,et al.  Software caching on cache-coherent multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[6]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[7]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[8]  Dirk Grunwald,et al.  Improving the cache locality of memory allocation , 1993, PLDI '93.

[9]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[12]  Chau-Wen Tseng,et al.  Compiler optimizations for improving data locality , 1994, ASPLOS VI.

[13]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[15]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[16]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[17]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[18]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[19]  Jonathan Grudin,et al.  Design and evaluation , 1995 .

[20]  Denis Trystram,et al.  Parallel algorithms and architectures , 1995 .

[21]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[22]  LiWei,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995 .

[23]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[24]  T. J. Watson,et al.  An E ective Hash-Based Algorithm for Mining Association RulesJong , 1995 .

[25]  Data and Computation Transformations for Multiprocessors , 1995, PPOPP.

[26]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.

[27]  Wei Li,et al.  Compiler cache optimizations for banded matrix problems , 1995, ICS '95.

[28]  Monica S. Lam,et al.  Data and computation transformations for multiprocessors , 1995, PPOPP '95.

[29]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[30]  R. Agrawal,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[31]  Todd C. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[32]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-Memory Multi-Processors , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[33]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[34]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[35]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[36]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[37]  T. Mowry,et al.  Compiler-based prefetching for recursive data structures , 1996, ASPLOS VII.

[38]  Arbee L. P. Chen,et al.  An efficient approach to discovering knowledge from large databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[39]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[40]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[41]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[42]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[43]  Mohammed J. Zaki,et al.  Compile-Time Scheduling Algorithms for a Heterogeneous Network of Workstations , 1997, Comput. J..

[44]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[45]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[46]  Srinivasan Parthasarathy,et al.  A localized algorithm for parallel association mining , 1997, SPAA '97.

[47]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[48]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[49]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[50]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[51]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[52]  David Wai-Lok Cheung,et al.  Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors , 1998, SPAA '98.

[53]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[54]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[55]  Srinivasan Parthasarathy,et al.  Memory Placement Techniques for Parallel Association Mining , 1998, KDD.

[56]  David Taniar,et al.  Parallel Data Mining , 2002 .