Mr-arm: a Map-Reduce Association Rule Mining Framework

Association rule is one of the primary tasks in data mining that discovers correlations among items in a transactional database. The majority of vertical and horizontal association rule mining algorithms have been developed to improve the frequent items discovery step which necessitates high demands on training time and memory usage particularly when the input database is very large. In this paper, we overcome the problem of mining very large data by proposing a new parallel Map-Reduce (MR) association rule mining technique called MR-ARM that uses a hybrid data transformation format to quickly finding frequent items and generating rules. The MR programming paradigm is becoming popular for large scale data intensive distributed applications due to its efficiency, simplicity and ease of use, and therefore the proposed algorithm develops a fast parallel distributed batch set intersection method for finding frequent items. Two implementations (Weka, Hadoop) of the proposed MR association rule algorithm have b...

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Peter I. Cowling,et al.  MMAC: a new multi-class, multi-label associative classification approach , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[5]  Mohammed J. Zaki Mining Non-Redundant Association Rules , 2004, Data Min. Knowl. Discov..

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[8]  Maozhen Li,et al.  MRSim: A discrete event based MapReduce simulator , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[9]  Xindong Wu,et al.  MReC4.5: C4.5 Ensemble Classification with MapReduce , 2009, 2009 Fourth ChinaGrid Annual Conference.

[10]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[11]  Milan Klement,et al.  MICROSOFT OFFICE EXCEL 2007: KONTINGENČNÍ TABULKY A GRAFY , 2010 .

[12]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[13]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[14]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[15]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[16]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[17]  David Taniar,et al.  ODAM: An optimized distributed association rule mining algorithm , 2004, IEEE Distributed Systems Online.

[18]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[19]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[20]  J. R. Quinlan,et al.  Data Mining Tools See5 and C5.0 , 2004 .

[21]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[22]  Shraddha Savaliya,et al.  An Effective Hash-Based Algorithm for Mining Association Rules , 2015 .

[23]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ronald C. Taylor An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[25]  Gary Geunbae Lee,et al.  Text Categorization Based on Boosting Association Rules , 2008, 2008 IEEE International Conference on Semantic Computing.

[26]  Vasudeva Varma,et al.  Using Pattern Classification for Task Assignment in MapReduce , 2009 .