Performance Analysis Using Apriori Algorithm Along with Spark and Python

We have proposed an improved Apriori algorithm based on comparing different data structures to obtain a better and improved performance level than presently available approaches. Our approach is to apply on large transaction data where space and time management has been a center of attraction. The improved algorithm is using an existing Apriori approach and gives us a more time efficient output. Our approach is implemented on a spark framework along with the PySpark facility that can process data on a much-improved rate compared to the Hadoop framework. Moreover, we have proposed that using python as our programming language has a faster computational rate. We have used a local file system for our data to be stored. In addition, we have shown our time efficiency on spark framework and generated a report using those data to compare spark based analysis on our proposed algorithm. Furthermore, this proposed method can also be effectively applied for a big data mining optimization purpose.

[1]  Hui Yang,et al.  Using HMT and HASH_TREE to Optimize Apriori Algorithm , 2011, 2011 International Conference on Business Computing and Global Informatization.

[2]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[3]  Paul E. Black,et al.  Software vulnerabilities precluded by spark , 2011, SIGAda.

[4]  H. Parikh,et al.  A SURVEY ON BIG DATA ANALYSIS AND CHALLENGES , 2015 .