Parallel leap: large-scale maximal pattern mining in a distributed environment

When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have been suggested such as sequential mining for maximal patterns or searching for all frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. In this work we propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of maximal patterns which, to the best of our knowledge, has never been proposed efficiently before. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than 2 hours

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  David Wai-Lok Cheung,et al.  Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors , 1998, SPAA '98.

[4]  Pawat Laohawee,et al.  Novel Query Expansion Technique using Apriori Algorithm , 1999, TREC.

[5]  Soon Myoung Chung,et al.  Parallel mining of maximal frequent itemsets from databases , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[6]  Masaru Kitsuregawa,et al.  Tree Structure Based Parallel Frequent Pattern Mining on PC Cluster , 2003, DEXA.

[7]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[10]  Heikki Mannila,et al.  Inductive Databases and Condensed Representations for Data Mining , 1997, ILPS.

[11]  Jiawei Han,et al.  A fast distributed algorithm for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[12]  Fernando Gustavo Tinetti,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers. Barry Wilkinson, C. Michael Allen , 2000 .

[13]  Haym Hirsh,et al.  Mining Associations in Text in the Presence of Background Knowledge , 1996, KDD.

[14]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[15]  Jiawei Han,et al.  Mining recurrent items in multimedia with progressive resolution refinement , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Alex A. Freitas,et al.  A Survey of Parallel Data Mining , 1996 .

[18]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[19]  Osmar R. Zaïane,et al.  Fast parallel association rule mining without candidacy generation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[20]  Osmar R. Zaïane,et al.  Pattern lattice traversal by selective jumps , 2005, KDD '05.

[21]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[22]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[23]  Philip S. Yu,et al.  Efficient parallel data mining for association rules , 1995, CIKM '95.