Parallel Mining of Frequent Closed Patterns: Harnessing Modern Computer Architectures

Inspired by emerging multi-core computer architectures, in this paper we present MT_CLOSED, a multi-threaded algorithm for frequent closed itemset mining (FCIM). To the best of our knowledge, this is the first FCIM parallel algorithm proposed so far. We studied how different duplicate checking techniques, typical of FCIM algorithms, may affect this parallelization. We showed that only one of them allows to decompose the global FCIM problem into independent tasks that can be executed in any order, and thus in parallel. Finally we show how MT_Closed efficiently harness modern CPUs. We designed and tested several parallelization paradigms by investigating static/dynamic decomposition and scheduling of tasks, thus showing its scalability w.r.t. to the number of CPUs. We analyzed the cache friendliness of the algorithm. Finally, we provided additional speed-up by introducing SIMD extensions.

[1]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[2]  Nicolas Pasquier,et al.  Mining Bases for Association Rules Using Closed Sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[3]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[4]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[5]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[6]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[7]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[9]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[10]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[11]  L. Beran,et al.  [Formal concept analysis]. , 1996, Casopis lekaru ceskych.