On the design of hardware architectures for parallel frequent itemsets mining

Abstract Algorithms for Frequent Itemsets Mining have proved their effectiveness for extracting frequent sets of patterns in datasets. However, in some specific cases, they do not obtain the expected results in an acceptable time. For this reason, Field Programmable Gates Array-based architectures for Frequent Itemsets Mining have been proposed to accelerate this task. The current paper proposes a search strategy for Frequent Itemsets Mining based on equivalence classes partitioning. The partitioning on equivalence classes allows dividing the search space into disjoint sets that can be processed in parallel. Consequently, this paper presents the design and implementation of two hardware architectures that exploit the nested parallelism in the proposed search strategy. These hardware architectures are capable of obtaining frequent itemsets regardless of the number of distinct items and the number of transactions in the dataset, which are the main issues reported in the reviewed literature. Furthermore, the proposed architectures explore the trade-off between acceleration and hardware resource utilization. The experimental results obtained demonstrate that the proposed search strategy can be scaled to achieve a speedup in the processing time of 40 times faster than software-based implementations.

[1]  Yue Qi,et al.  FPGA Acceleration for Intersection Computation in Frequent Itemset Mining , 2013, 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery.

[2]  Meikang Qiu,et al.  Effective FPGA-Based Enhancement of Quantitative Frequent Itemset Mining , 2016 .

[3]  Ming-Syan Chen,et al.  Hardware-Enhanced Association Rule Mining with Hashing and Pipelining , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[7]  Fan Zhang,et al.  Accelerating frequent itemset mining on graphics processing units , 2013, The Journal of Supercomputing.

[8]  Mohammad Teshnehlab,et al.  negFIN: An efficient algorithm for fast mining frequent itemsets , 2018, Expert Syst. Appl..

[9]  Claudia Feregrino Uribe,et al.  Approximate frequent itemsets mining on data streams using hashing and lexicographie order in hardware , 2017, 2017 IEEE 8th Latin American Symposium on Circuits & Systems (LASCAS).

[10]  Joseph Zambreno,et al.  Design and Analysis of a Reconfigurable Platform for Frequent Pattern Mining , 2011, IEEE Transactions on Parallel and Distributed Systems.

[11]  Alfred Strey,et al.  Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[12]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[13]  Viktor K. Prasanna,et al.  An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[14]  Joseph Zambreno,et al.  A Reconfigurable Platform for Frequent Pattern Mining , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[15]  Shoab A. Khan,et al.  Parallel architecture for implementation of frequent itemset mining using FP-growth , 2017, 2017 International Conference on Signals and Systems (ICSigSys).

[16]  Yue Qi,et al.  Accelerating Intersection Computation in Frequent Itemset Mining with FPGA , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[18]  José Francisco Martínez Trinidad,et al.  Algorithms for mining frequent itemsets in static and dynamic datasets , 2010, Intell. Data Anal..

[19]  Mohamed K. Sheriff Big Data Revolution: Is It a Business Disruption? , 2018 .

[20]  Ming-Syan Chen,et al.  Hardware Enhanced Mining for Association Rules , 2006, PAKDD.

[21]  Bingsheng He,et al.  Parallel Data Mining on Graphics Processors , 2011 .

[22]  Vincent Leroy,et al.  Efficient and Versatile FPGA Acceleration of Support Counting for Stream Mining of Sequences and Frequent Itemsets , 2017, ACM Trans. Reconfigurable Technol. Syst..

[23]  Fan Zhang,et al.  An FPGA-Based Accelerator for Frequent Itemset Mining , 2013, TRETS.

[24]  J ZakiMohammed Parallel and Distributed Association Mining , 1999 .

[25]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[26]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[27]  Ali Khajeh-Saeed,et al.  GPU-Supercomputer Acceleration of Pattern Matching , 2011 .

[28]  Robert J. Fowler,et al.  Improving Energy Efficiency in Memory-constrained Applications Using Core-specific Power Control , 2017, E2SC@SC.

[29]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[30]  Claudia Feregrino Uribe,et al.  On the design of hardware-software architectures for frequent itemsets mining on data streams , 2017, Journal of Intelligent Information Systems.

[31]  Claudia Feregrino Uribe,et al.  A Highly Parallel Algorithm for Frequent Itemset Mining , 2010, MCPR.

[32]  René Cumplido,et al.  Hardware Architectures for Frequent Itemset Mining Based on Equivalence Classes Partitioning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[33]  Srikumar Krishnamoorthy,et al.  Mining top-k high utility itemsets with effective threshold raising strategies , 2019, Expert Syst. Appl..

[34]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .