A novel process-based association rule approach through maximal frequent itemsets for big data processing

The maximal frequent itemsets issue in big data processing has become a hot research topic. Most of the previous work on big data processing directly analyzes the data through the existing approaches, which would cause problems of redundant computation, high time complexity, and large storage space. To solve the problems, this paper proposes a Heuristic MapReduce-based Association rule approach through Maximal frequent itemsets mining, HMAM. The main idea is: At first, by directly operating on the transaction database, we allocate transactions to different processing nodes and group all transactions according to dimension. Then, we screen the most frequent transactions from each transaction set using the Bitmap-Sort and obtain best-transaction-set through aggregating all the transaction-elects of each transaction set. The current candidate maximal frequent itemsets can be acquired by removing sub-transactions in terms of the inclusion relations of the items in best-transaction-set. At the same time, each subset of sub-transactions in the candidate maximal frequent itemsets is discarded from all transaction sets. Then the final candidate maximal frequent itemsets can be obtained by iteration until each transaction set is empty. Finally, we achieve the acquisition of maximal frequent itemsets by employing the minimum support threshold. The experimental results demonstrate that compared with the existing approaches, HMAM significantly avoids producing a large number of candidate itmesets resulting from join operation, accelerates the speed of mining the maximal frequent itemsets, and improves the utilization rate of resources simultaneously. Allocate transactions to different nodes and group them in terms of dimension.Screen the most frequent transactions from each transaction set using Bitmap-Sort.Obtain maximal frequent itemsets by employing the minimum support threshold.

[1]  NIDHI TIWARI,et al.  Classification Framework of MapReduce Scheduling Algorithms , 2015, ACM Comput. Surv..

[2]  Xu Li,et al.  An Improved Apriori Algorithm Based on Association Analysis , 2012, 2012 Third International Conference on Networking and Distributed Computing.

[3]  Donghui Chen,et al.  High-Efficiency Algorithm for Mining Maximal Frequent Item Sets Based on Matrix , 2012, 2012 Fourth International Conference on Computational Intelligence and Communication Networks.

[4]  Li Yan,et al.  Extension of local association rules mining algorithm based on apriori algorithm , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[5]  Hui Yang,et al.  Using HMT and HASH_TREE to Optimize Apriori Algorithm , 2011, 2011 International Conference on Business Computing and Global Informatization.

[6]  Wei Xiang,et al.  An Improved Apriori Algorithm Based on Features , 2013, 2013 Ninth International Conference on Computational Intelligence and Security.

[7]  Reda Alhajj,et al.  Frequent Pattern Mining Using Semantic FP-Growth for Effective Web Service Ranking , 2014, 2014 IEEE International Conference on Web Services.

[8]  Rajkumar Buyya,et al.  A framework for ranking of cloud computing services , 2013, Future Gener. Comput. Syst..

[9]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[10]  Rodney S. Tucker,et al.  Green Cloud Computing: Balancing Energy in Processing, Storage, and Transport , 2011, Proceedings of the IEEE.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Martha Larson,et al.  Collaborative Filtering beyond the User-Item Matrix , 2014, ACM Comput. Surv..

[13]  Weiming Shen,et al.  Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce , 2014, Proceedings of the 2014 IEEE 18th International Conference on Computer Supported Cooperative Work in Design (CSCWD).

[14]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[15]  Tzung-Pei Hong,et al.  Incrementally fast updated frequent pattern trees , 2008, Expert Syst. Appl..

[16]  Mohsen Guizani,et al.  Distributed resource allocation in cloud-based wireless multimedia social networks , 2014, IEEE Network.

[17]  Xueyan Lin,et al.  MR-Apriori: Association Rules algorithm based on MapReduce , 2014, 2014 IEEE 5th International Conference on Software Engineering and Service Science.

[18]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[19]  Xin Zhang,et al.  A depth-first search algorithm of mining maximal frequent itemsets , 2015, 2015 Seventh International Conference on Advanced Computational Intelligence (ICACI).

[20]  Miao Zhang,et al.  Research of Improved FP-Growth Algorithm in Association Rules Mining , 2015, Sci. Program..

[21]  Liang Hu,et al.  A Heuristic Clustering-Based Task Deployment Approach for Load Balancing Using Bayes Theorem in Cloud Environment , 2016, IEEE Transactions on Parallel and Distributed Systems.

[22]  Tzung-Pei Hong,et al.  Linguistic data mining with fuzzy FP-trees , 2010, Expert Syst. Appl..

[23]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[24]  Lu Huang,et al.  A survey of mass data mining based on cloud-computing , 2012, Anti-counterfeiting, Security, and Identification.

[25]  Lei Ma,et al.  Research of Improved Apriori Algorithm Based on Itemset Array , 2013 .

[26]  Byeong Ho Kang,et al.  Comparative analysis of genetic based approach and Apriori algorithm for mining maximal frequent item sets , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[27]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[29]  Mingwei Xu,et al.  A Hop-by-Hop Routing Mechanism for Green Internet , 2016, IEEE Transactions on Parallel and Distributed Systems.

[30]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..