论文信息 - Parallel frequent set mining using inverted matrix approach

Parallel frequent set mining using inverted matrix approach

Mining frequent patterns in large transactional database is considered as one of the most important data mining problems. The recent explosive growth in data collection made the current rule mining algorithms restricted and insufficient to analyze excessively large transaction sets because they suffer from many problems when mining massive transaction datasets. Some of the major problems are: (1) required multiple database scan, (2) massive computational power requirement (3) huge memory requirement, and (4) lake of parallelism (5) less of interactive nature for different support value ([1–2]). In this paper an approach of Inverted matrix, the new representation of transactional database is used and distributed it amongst parallel nodes. Frequent item from the inverted matrix is assigned to parallel nodes. In parallel implementation, a Co-Occurrence Frequent Item (COFI) tree for assigning frequent item is generated by the parallel nodes. Mining process is accomplished by all nodes which generate all frequent items in which the assigned items are participated. Here, less communication is required amongst the master node all parallel node to generate all frequent itemsets. Two techniques have been used for assignment of frequent item to the parallel nodes, viz. (1) Alternate Loop Splitting (ALS), and (2) Block Loop Splitting (BLS). We have Implemented sequential as well as parallel algorithms for frequent set mining and compared its performance on mushroom [9] database having approximately 10000 transactions, 120 different items and 23 average transaction sizes. It has been found that alternate loop splitting achieves better time complexity as compared to block loop splitting. Also both the parallel techniques are found to be better than sequential algorithm.

Sanjay Garg | Sanjay D. Bhanderi

[1] Osmar R. Zaïane,et al. Parallel association rule mining with minimum inter-processor communication , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[2] R. Suganya,et al. Data Mining Concepts and Techniques , 2010 .

[3] Philip S. Yu,et al. Efficient parallel data mining for association rules , 1995, CIKM '95.

[4] Osmar R. Zaïane,et al. Inverted matrix: efficient discovery of frequent items in large datasets in the context of interactive mining , 2003, KDD '03.

[5] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[6] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[7] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[8] Osmar R. Zaïane,et al. COFI-tree Mining: A New Approach to Pattern Growth with Reduced Candidacy Generation , 2003, FIMI.

[9] Andreas Mueller,et al. Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .