FP-Growth Parallel Algorithm in Cluster System

When the dataset size is huge,both the memory usage and computational cost of FP-Growth algorithm are expensive.This paper proposes a parallel algorithm,which is designed to run on the PC cluster.This algorithm finds all the conditional pattern bases of frequent items by the projection method.It splits the mining task into number of independent sub-tasks,executes these sub-tasks in parallel on nodes and aggregates the sub-results back for the final result.Experiments show that this parallel algorithm not only can accelerate the computational speed,avoids the memory overflow,but also achieves much better scalability than the FP-Growth algorithm.