Apriori Parallel Improved Algorithm Based on MapReduce Distributed Architecture

Under the environment of big data, efficiency is low and there are many candidates when the traditional serial Apriori algorithm in dealing with massive data. This paper proposes a parallel better algorithm based on MapReduce distributed architecture. Based on the basic Apriori algorithm on MapReduce, this paper makes a reconstruction of the original transaction database, and implements parallel in data set fragmentation. The algorithm optimizes the transaction database, candidate item sets counting and pruning strategy. The experimental results show that the improved algorithm proposed in this paper can reduce the candidate items and improve the efficiency.