ScadiBino: An effective MapReduce-based association rule mining method

Current data mining algorithms are impractical for huge amounts of data because they are time consuming and therefore inefficient. Association rule mining is one of the most famous data mining algorithms. Many parallel and distributed methods have been proposed for association rule mining. However, these methods are not suited to big data for a number of reasons, such as improper data location, data skewness, lack of load balancing, lack of support for generalized association rule mining, and lack of an obvious method for rule extraction. The MapReduce-based architecture is a parallel and distributable solution for association rule mining. To improve the performance of MapReduce, proposed methods for association rules need to be customized. The performance of iterative algorithms in MapReduce architectures may not be optimum. Two main issues affect the performance of MapReduce architectures: data placement and network traffic. In this paper, a scalable and distributable binominal association rule mining method (ScaDiBino ARM) is proposed. This method converts input data items to binominal format to take advantage of scalable and distributable attributes of MapReduce structures. The proposed method was evaluated by applying it to real traffic data of a mobile operator to enable it to recommend values added services (VAS) to its customers. The results show that the rule extraction time improved significantly after applying the proposed rule mining method.

[1]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[2]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[3]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[4]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[5]  I-En Liao,et al.  A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping , 2014, Inf. Sci..

[6]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[7]  Jin-Soo Kim,et al.  Large-scale incremental processing with MapReduce , 2014, Future Gener. Comput. Syst..

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Shan Huang,et al.  ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms , 2012, Data Knowl. Eng..

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  Sven Groot Modeling I/O Interference in Data Intensive Map-Reduce Applications , 2012, 2012 IEEE/IPSJ 12th International Symposium on Applications and the Internet.

[12]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[13]  Zhiwei Xu,et al.  RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[14]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.

[15]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[16]  Weizhong Zhao,et al.  h-MapReduce: A Framework for Workload Balancing in MapReduce , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[17]  D UllmanJeffrey,et al.  Dynamic itemset counting and implication rules for market basket data , 1997 .

[18]  Fred Highland,et al.  Fitting the Problem to the Paradigm: Algorithm Characteristics Required for Effective Use of MapReduce , 2012, Complex Adaptive Systems.

[19]  Eli Upfal,et al.  PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce , 2012, CIKM.

[20]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[21]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[22]  Stéphane Marchand-Maillet,et al.  MRO-MPI: MapReduce overlapping using MPI and an optimized data exchange policy , 2013, Parallel Comput..

[23]  Jongwook Woo,et al.  Market Basket Analysis Algorithm with Map/Reduce of Cloud Computing , 2012 .

[24]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[25]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[26]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[27]  Ayman Elnaggar,et al.  Towards Real-Time Analytics in the Cloud , 2013, 2013 IEEE Ninth World Congress on Services.

[28]  Jongwook Woo,et al.  MapReduce Example with HBase for Association Rule , 2014 .

[29]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[30]  Gabriel Antoniu,et al.  BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[31]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[32]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[33]  Shih-Ying Chen,et al.  Using MapReduce Framework for Mining Association Rules , 2013, ITCS.