The MapReduce Model on Cascading Platform for Frequent Itemset Mining

The implementation of p arallel algorithms is very interesting research recently . Parallelism is very suitable to handle large-scale data processing. MapReduce is one of the parallel and distributed programming models. T he implementation of parallel programming faces many difficulties. The Cascading gives eas y scheme of Hadoop system which implement s MapReduce model. Frequent itemsets are most often appear objects in a dataset. The Frequent Itemset Mining (FIM) requires complex computation. FIM is a complicated problem when implemented on large-scale data. This paper discusses the implementation of MapReduce model on Cascading for FIM. The experiment uses the Amazon dataset product co-purchasing network metadata. The experiment shows the fact that the simple mechanism of Cascading can be used to solve FIM problem. It gives time complexity O(n), more efficient than the nonparallel which has complexity O(n 2 /m).

[1]  Paco Xander Nathan,et al.  Enterprise Data Workflows with Cascading , 2013 .

[2]  Yao Zhang,et al.  MapReduce-Based Balanced Mining for Closed Frequent Itemset , 2012, 2012 IEEE 19th International Conference on Web Services.

[3]  Nandita Yambem,et al.  A Survey on Data Mining Algorithms on Apache Hadoop Platform , 2014 .

[4]  Shiow-Yang Wu,et al.  Sequence-Growth: A Scalable and Effective Frequent Itemset Mining Algorithm for Big Data Based on MapReduce Framework , 2015, 2015 IEEE International Congress on Big Data.

[5]  R. B. V. Subramanyam,et al.  Mining Interesting Infrequent Itemsets from Very Large Data based on MapReduce Framework , 2015 .

[6]  Rajat Bhatnagar,et al.  MapReduce based frequent itemset mining algorithm on stream data , 2015, 2015 Global Conference on Communication Technologies (GCCT).

[7]  Alekh Jindal,et al.  Hadoop++ , 2010 .

[8]  A. Padmapriya Collaborative-Frequent Itemset Mining of Big Data Using Mapreduce Framework , 2017 .

[9]  Suman Saha,et al.  Comparative Analysis of MapReduce Framework for Efficient Frequent Itemset Mining in Social Network Data , 2016 .

[10]  Osman Hegazy,et al.  AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[11]  Seema Tribhuvan,et al.  Parallel Frequent Itemset Mining for Big Datasets using Hadoop-MapReduce Paradigm , 2017 .

[12]  Guozi Sun,et al.  MapReduce-based frequent itemset mining for analysis of electronic evidence , 2013, 2013 8th International Workshop on Systematic Approaches to Digital Forensics Engineering (SADFE).

[13]  Xingjian Li,et al.  An Algorithm for Mining Frequent Itemsets from Library Big Data , 2014, J. Softw..

[14]  Bo He,et al.  The Mining Algorithm of Frequent Itemsets Based on Mapreduce and FP-tree , 2017, 2017 International Conference on Computer Network, Electronic and Automation (ICCNEA).

[15]  Ferenc Kovacs,et al.  Frequent itemset mining on hadoop , 2013, 2013 IEEE 9th International Conference on Computational Cybernetics (ICCC).

[16]  K. Prasanna,et al.  Frequent Data Partitioning using Parallel Mining Item Sets in MapReduce , 2017 .