Association Rule Generation in Streams

2 Abstract: Many applications involve the generation and analysis of a new kind of data, called stream data, where data flow in and out of an observation platform or window dynamically. Such data streams have the unique features such as huge or possibly infinite volume, dynamically changing, flowing in or out in a fixed order, allowing only one or a small number of scans. An important problem in data stream mining is that of finding frequent items in the stream. This problem finds application across several domains such as financial systems, web traffic monitoring, internet advertising, retail and e- business. This raises new issues that need to be considered when developing association rule mining technique for stream data. In this paper, we propose an integrated online streaming algorithm for solving both problems of finding the top-k elements, and finding frequent elements in a data stream. Our Space-Saving algorithm reports both frequent and top-k elements with tight guarantees on errors. We also develop the notion of association rules in streams of elements. The Streaming-Rules algorithm is integrated with Space-Saving algorithm to report 1-1 association rules with tight guarantees on errors, using minimal space, and limited processing per element and we also implement the Apriori algorithm for static datasets and generated association rules and implement Streaming-Rules algorithm for pair, triplet association rules. We compare the top- rules of static datasets with output of stream datasets and find percentage of error.

[1]  Vikram Singh,et al.  Integrating User's Domain Knowledge with Association Rule Mining , 2010, ArXiv.

[2]  Divyakant Agrawal,et al.  An integrated efficient solution for computing frequent and top-k elements in data streams , 2006, TODS.

[3]  Erik D. Demaine,et al.  Identifying frequent items in sliding windows over on-line packet streams , 2003, IMC '03.

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Suzana Loskovska,et al.  A SURVEY OF STREAM DATA MINING , 2007 .

[6]  Biswaranjan Nayak,et al.  A Pragmatic Approach on Association Rule Mining and its Effective Utilization in Large Databases , 2012 .

[7]  Themis Palpanas,et al.  Frequent items in streaming data: An experimental evaluation of the state-of-the-art , 2009, Data Knowl. Eng..

[8]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[9]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[10]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[11]  João Gama,et al.  Data Streams - Models and Algorithms , 2007, Advances in Database Systems.

[12]  Themis Palpanas,et al.  Efficiently Discovering Recent Frequent Items in Data Streams , 2008, SSDBM.

[13]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[14]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[15]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[16]  Ping-Yu Hsu,et al.  Algorithms for mining association rules in bag databases , 2004, Inf. Sci..

[17]  Hebah H. O. Nasereddin Stream Data Mining , 2011, Int. J. Web Appl..

[18]  Marios Hadjieleftheriou,et al.  Finding frequent items in data streams , 2008, Proc. VLDB Endow..

[19]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[20]  Chinchen Chang,et al.  EFFICIENT ALGORITHMS FOR MINING SHARE-FREQUENT ITEMSETS , 2005 .

[21]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[22]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[23]  David H. Bailey,et al.  Algorithms and applications , 1988 .