New parallel algorithms for frequent itemset mining in very large databases

Frequent itemset mining is a classic problem in data mining. It is a nonsupervised process which concerns in finding frequent patterns (or itemsets) hidden in large volumes of data in order to produce compact summaries or models of the database. These models are typically used to generate association rules, but recently they have also been used in far reaching domains like e-commerce and bio-informatics. Because databases are increasing in terms of both dimension (number of attributes) and size (number of records), one of the main issues in a frequent itemset mining algorithm is the ability to analyze very large databases. Sequential algorithms do not have this ability, especially in terms of run-time performance, for such very large databases. Therefore, we must rely on high performance parallel and distributed computing. We present new parallel algorithms for frequent itemset mining. Their efficiency is proven through a series of experiments on different parallel environments, that range from shared-memory multiprocessors machines to a set of SMP clusters connected together through a high speed network. We also briefly discuss an application of our algorithms to the analysis of large databases collected by a Brazilian Web portal.

[1]  Wei Li,et al.  New parallel algorithms for fast discovery of associ-ation rules , 1997 .

[2]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  David Wai-Lok Cheung,et al.  Effect of Data Distribution in Parallel Mining of Associations , 1999, Data Mining and Knowledge Discovery.

[4]  Srinivasan Parthasarathy,et al.  A localized algorithm for parallel association mining , 1997, SPAA '97.

[5]  Ricardo Bianchini,et al.  Software caching on cache-coherent multiprocessors , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[6]  Srinivasan Parthasarathy,et al.  Parallel Data Mining for Association Rules on Shared-memory Systems , 1998 .

[7]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[8]  Vipin Kumar,et al.  Efficient Parallel Algorithms for Mining Associations , 1999, Large-Scale Parallel Data Mining.

[9]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[10]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[11]  Y Ou New Algorithms for Mining Assosiation Rules , 2004 .

[12]  Srinivasan Parthasarathy,et al.  Mining Frequent Itemsets in Evolving Databases , 2002, SDM.

[13]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[14]  Prashant J. Shenoy,et al.  Rules of thumb in data engineering , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[15]  Wagner Meira,et al.  Real World Association Rule Mining , 2002, BNCOD.