Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware

Data streams are unbounded and infinite flows of data arriving at high rates which cannot be stored for offline processing. Because of this, classical approaches for Data Mining cannot be used straightforwardly in data stream scenario. This paper introduces a single-pass hardware-based algorithm for frequent itemsets mining on data streams that uses the top-k frequent 1-itemsets. Experimental results of the hardware implementation of the proposed algorithm are also presented and discussed.

[1]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[2]  Joseph Zambreno,et al.  A Reconfigurable Platform for Frequent Pattern Mining , 2008, 2008 International Conference on Reconfigurable Computing and FPGAs.

[3]  Scott Hauck,et al.  Reconfigurable computing: a survey of systems and software , 2002, CSUR.

[4]  Claudia Feregrino Uribe,et al.  A Highly Parallel Algorithm for Frequent Itemset Mining , 2010, MCPR.

[5]  B. Sathiyabhama,et al.  ENHANCED RECONFIGURABLE WEIGHTED ASSOCIATION RULE MINING FOR FREQUENT PATTERNS OF WEB LOGS , 2014 .

[6]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[7]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[8]  Viktor K. Prasanna,et al.  Efficient hardware data mining with the Apriori algorithm on FPGAs , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[9]  Viktor K. Prasanna,et al.  An Architecture for Efficient Hardware Data Mining using Reconfigurable Computing Systems , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[10]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[11]  Hargyo Tri Nugroho,et al.  Implementing On-line Sketch-Based Change Detection on a NetFPGA Platform , 2010 .

[12]  Wang Ben-nian Frequent Pattern Mining in Data Streams , 2007 .

[13]  O. P. Vyas,et al.  Data Stream Mining: A Review on Windowing Approach , 2012 .

[14]  Viktor K. Prasanna,et al.  Online heavy hitter detector on FPGA , 2013, 2013 International Conference on Reconfigurable Computing and FPGAs (ReConFig).

[15]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[16]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[17]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[18]  Elena Baralis,et al.  An Efficient Itemset Mining Approach for Data Streams , 2011, KES.

[19]  Lukasz Golab,et al.  Data Stream Management Issues { A Survey , 2003 .

[20]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[21]  Toon Calders,et al.  Mining top-k frequent items in a data stream with flexible sliding windows , 2010, KDD.

[22]  Fan Zhang,et al.  An FPGA-Based Accelerator for Frequent Itemset Mining , 2013, TRETS.

[23]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[24]  Divyakant Agrawal,et al.  Efficient Computation of Frequent and Top-k Elements in Data Streams , 2005, ICDT.

[25]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[26]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[27]  O. P. Vyas,et al.  Data Stream Mining: A Review , 2013 .

[28]  Carsten Lund,et al.  Online identification of hierarchical heavy hitters: algorithms, evaluation, and applications , 2004, IMC '04.

[29]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[30]  Yue Qi,et al.  Accelerating Intersection Computation in Frequent Itemset Mining with FPGA , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[31]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[32]  Alfred Strey,et al.  Novel strategies for hardware acceleration of frequent itemset mining with the apriori algorithm , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[33]  Wilfred Ng,et al.  A survey on algorithms for mining frequent itemsets over data streams , 2008, Knowledge and Information Systems.

[34]  Joseph Zambreno,et al.  Mining Association Rules with systolic trees , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[35]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[36]  Divesh Srivastava,et al.  Finding Hierarchical Heavy Hitters in Data Streams , 2003, VLDB.

[37]  Joseph Zambreno,et al.  Design and Analysis of a Reconfigurable Platform for Frequent Pattern Mining , 2011, IEEE Transactions on Parallel and Distributed Systems.

[38]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[39]  Ming-Syan Chen,et al.  Hardware-Enhanced Association Rule Mining with Hashing and Pipelining , 2008, IEEE Transactions on Knowledge and Data Engineering.