Frequent Pattern Mining in Big Data

Frequent Pattern Mining (FPM) is one of the most well-known techniques to extract knowledge from data. The combinatorial explosion of FIM methods become even more problematic when they are applied to Big Data. Fortunately, recent improvements in the field of parallel programming already provide good tools to tackle this problem. However, these tools come with their own technical challenges, e.g. balanced data distribution and intercommunication costs. In this paper, investigation applicability of FIM(Frequent Item-set Mining) techniques on the Map Reduce platform. The introduce two new methods for mining large datasets: Dist-Eclat focuses on speed while BigFIM is optimized to run on really large datasets. In our experiments show the scalability of our methods. Mining frequent itemsets is one of the most investigated fields in data mining. It is a fundamental and crucial task. In order to improve the efficiency and processing the data parallel Map reduce algorithm is used, for mining frequent itemsets, is proposed. Firstly, the data structure binary string is employed to describe the database. Hadoop is used to process the big data parallel using Map reduce. Large files are stored in the Hadoop file system and processing the input files for finding the frequent patterns in the given input files. Here the system, which acts as a source can produce structured noise of its own, and hence the dependency on helpers may get reduced.

[1]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[2]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[3]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[4]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[5]  Osman Hegazy,et al.  AN EFFICIENT IMPLEMENTATION OF APRIORI ALGORITHM BASED ON HADOOP-MAPREDUCE MODEL , 2012 .

[6]  Nick Cercone,et al.  Efficient mining of frequent itemsets in social network data based on MapReduce framework , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[7]  Jin Chang,et al.  Balanced parallel FP-Growth with MapReduce , 2010, 2010 IEEE Youth Conference on Information, Computing and Telecommunications.

[8]  Bora Uçar,et al.  Parallel Frequent Item Set Mining with Selective Item Replication , 2011, IEEE Transactions on Parallel and Distributed Systems.

[9]  Qing He,et al.  Parallel Implementation of Apriori Algorithm Based on MapReduce , 2012, SNPD.