Parallel frequent set mining using inverted matrix approach

Mining frequent patterns in large transactional database is considered as one of the most important data mining problems. The recent explosive growth in data collection made the current rule mining algorithms restricted and insufficient to analyze excessively large transaction sets because they suffer from many problems when mining massive transaction datasets. Some of the major problems are: (1) required multiple database scan, (2) massive computational power requirement (3) huge memory requirement, and (4) lake of parallelism (5) less of interactive nature for different support value ([1–2]). In this paper an approach of Inverted matrix, the new representation of transactional database is used and distributed it amongst parallel nodes. Frequent item from the inverted matrix is assigned to parallel nodes. In parallel implementation, a Co-Occurrence Frequent Item (COFI) tree for assigning frequent item is generated by the parallel nodes. Mining process is accomplished by all nodes which generate all frequent items in which the assigned items are participated. Here, less communication is required amongst the master node all parallel node to generate all frequent itemsets. Two techniques have been used for assignment of frequent item to the parallel nodes, viz. (1) Alternate Loop Splitting (ALS), and (2) Block Loop Splitting (BLS). We have Implemented sequential as well as parallel algorithms for frequent set mining and compared its performance on mushroom [9] database having approximately 10000 transactions, 120 different items and 23 average transaction sizes. It has been found that alternate loop splitting achieves better time complexity as compared to block loop splitting. Also both the parallel techniques are found to be better than sequential algorithm.