Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data

Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to ${3} {\cdot } { {10}}^{{{18}}}$ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.

[1]  Y. Ouinten,et al.  Mining maximal frequent itemsets: A java implementation of FPMAX algorithm , 2009, 2009 International Conference on Innovations in Information Technology (IIT).

[2]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[3]  Mykola Pechenizkiy,et al.  Speeding-Up Association Rule Mining With Inverted Index Compression , 2016, IEEE Transactions on Cybernetics.

[4]  Sen Zhang,et al.  New Techniques for Mining Frequent Patterns in Unordered Trees , 2015, IEEE Transactions on Cybernetics.

[5]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[6]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Hing Kai Chan,et al.  Recent Development in Big Data Analytics for Business Operations and Risk Management , 2017, IEEE Transactions on Cybernetics.

[9]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[10]  Benjamin C. M. Fung,et al.  Mining High Utility Patterns in One Phase without Generating Candidates , 2016, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bart Goethals,et al.  Frequent Itemset Mining for Big Data , 2013, 2013 IEEE International Conference on Big Data.

[12]  Sebastián Ventura,et al.  Design and behavior study of a grammar-guided genetic programming algorithm for mining association rules , 2011, Knowledge and Information Systems.

[13]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[14]  Sebastián Ventura,et al.  On the Use of Genetic Programming for Mining Comprehensible Rules in Subgroup Discovery , 2014, IEEE Transactions on Cybernetics.

[15]  Daniel M. Batista,et al.  A Survey of Large Scale Data Management Approaches in Cloud Environments , 2011, IEEE Communications Surveys & Tutorials.

[16]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[17]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[19]  José María Luna Pattern mining: current status and emerging topics , 2016, Progress in Artificial Intelligence.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.