论文信息 - Performance Analysis Using Apriori Algorithm Along with Spark and Python

Performance Analysis Using Apriori Algorithm Along with Spark and Python

We have proposed an improved Apriori algorithm based on comparing different data structures to obtain a better and improved performance level than presently available approaches. Our approach is to apply on large transaction data where space and time management has been a center of attraction. The improved algorithm is using an existing Apriori approach and gives us a more time efficient output. Our approach is implemented on a spark framework along with the PySpark facility that can process data on a much-improved rate compared to the Hadoop framework. Moreover, we have proposed that using python as our programming language has a faster computational rate. We have used a local file system for our data to be stored. In addition, we have shown our time efficiency on spark framework and generated a report using those data to compare spark based analysis on our proposed algorithm. Furthermore, this proposed method can also be effectively applied for a big data mining optimization purpose.

Fei Gao | Jiangjiang Liu | Chandrima Bhowmick

[1] Hui Yang,et al. Using HMT and HASH_TREE to Optimize Apriori Algorithm , 2011, 2011 International Conference on Business Computing and Global Informatization.

[2] Scott Shenker,et al. Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[3] Paul E. Black,et al. Software vulnerabilities precluded by spark , 2011, SIGAda.

[4] H. Parikh,et al. A SURVEY ON BIG DATA ANALYSIS AND CHALLENGES , 2015 .