An efficient vertical-Apriori Mapreduce algorithm for frequent item-set mining

Algorithms such as OPUS and Apriori-based Mapreduce for enhancing the efficiency of mining frequent item-set for pattern recognition application from transactional dataset have been proposed in the literature. Most of these algorithms are, however, evaluated offline on relatively small data size. When confronting with larger data size, which is inevitable for todays organisation, most if not all algorithms performed not as efficient as required to meet the real time big data driven decision making needs. We therefore attempt to solve these efficiency problems by proposing a VAMR (Vertical-Apriori Map-reduce) algorithm. VAMR is based on data attribute identifier which is exploited as capability metric for mining frequency item-set from large dataset in a single node (for example in a single site enterprise) that has no distributed and parallel computing system environment. Our evaluations using synthetic datasets and data from public repository suggest that VAMR algorithm can offer superior efficiency in mining frequent item-sets from large transaction dataset.

[1]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[2]  Ming-Yen Lin,et al.  Apriori-based frequent itemset mining algorithms on MapReduce , 2012, ICUIMC.

[3]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[4]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[5]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[6]  Masaru Kitsuregawa,et al.  Hash based parallel algorithms for mining association rules , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[7]  Geoffrey I. Webb Self-sufficient itemsets: An approach to screening potentially interesting associations between items , 2010, TKDD.

[8]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .

[9]  Chuck Lam,et al.  Hadoop in Action , 2010 .

[10]  David A. Padua,et al.  A sampling-based framework for parallel data mining , 2005, PPoPP.

[11]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Craig MacDonald,et al.  MapReduce indexing strategies: Studying scalability and efficiency , 2012, Inf. Process. Manag..

[13]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[14]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[15]  Tzung-Pei Hong,et al.  A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..

[16]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[17]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.