A New Data Stream Mining Algorithm for Interestingness-Rich Association Rules

Frequent itemset mining and association rule generation is a challenging task in data stream. Even though, various algorithms have been proposed to solve the issue, it has been found out that only frequency does not decides the significance interestingness of the mined itemset and hence the association rules. This accelerates the algorithms to mine the association rules based on utility i.e. proficiency of the mined rules. However, fewer algorithms exist in the literature to deal with the utility as most of them deals with reducing the complexity in frequent itemset/association rules mining algorithm. Also, those few algorithms consider only the overall utility of the association rules and not the consistency of the rules throughout a defined number of periods. To solve this issue, in this paper, an enhanced association rule mining algorithm is proposed. The algorithm introduces new weightage validation in the conventional association rule mining algorithms to validate the utility and its consistency in the mined association rules. The utility is validated by the integrated calculation of the cost/price efficiency of the itemsets and its frequency. The consistency validation is performed at every defined number of windows using the probability distribution function, assuming that the weights are normally distributed. Hence, validated and the obtained rules are frequent and utility efficient and their interestingness are distributed throughout the entire time period. The algorithm is implemented and the resultant rules are compared against the rules that can be obtained from conventional mining algorithms.

[1]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[2]  Florent Masseglia,et al.  Mining sequential patterns from data streams: a centroid approach , 2006, Journal of Intelligent Information Systems.

[3]  Mihaela van der Schaar,et al.  Configuring Trees of Classifiers in Distributed Multimedia Stream Mining Systems , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[5]  Younghee Kim,et al.  Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams , 2010, J. Inf. Process. Syst..

[6]  Byeong-Soo Jeong,et al.  Efficient Mining of High Utility Patterns over Data Streams with a Sliding Window Method , 2010 .

[7]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[8]  Vincent S. Tseng,et al.  An efficient algorithm for mining temporal high utility itemsets from data streams , 2008, J. Syst. Softw..

[9]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Ricard Gavaldà,et al.  Mining adaptively frequent closed unlabeled rooted trees in data streams , 2008, KDD.

[12]  Ming-Syan Chen,et al.  Sliding window filtering: an efficient method for incremental mining on a time-variant database , 2005, Inf. Syst..

[13]  Won Suk Lee,et al.  Statistical grid-based clustering over data streams , 2004, SGMD.

[14]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[15]  Jyothi Pillai,et al.  User centric approach to itemset utility mining in Market Basket Analysis , 2011 .

[16]  Florent Masseglia,et al.  Mining Data Streams for Frequent Sequences Extraction , 2005 .

[17]  Mihaela van der Schaar,et al.  A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems , 2009, EURASIP J. Adv. Signal Process..

[18]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[19]  LeeWon Suk,et al.  Statistical grid-based clustering over data streams , 2004 .

[20]  Mihaela van der Schaar,et al.  A framework for distributed multimedia stream mining systems using coalition-based foresighted strategies , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Brian Foo,et al.  Distributed classifier chain optimization for real-time multimedia stream mining systems , 2008, Electronic Imaging.

[22]  Carlo Zaniolo,et al.  Mining Noisy Data Streams via a Discriminative Model , 2004, Discovery Science.

[23]  Walid G. Aref,et al.  STAGGER: Periodicity Mining of Data Streams Using Expanding Sliding Windows , 2006, Sixth International Conference on Data Mining (ICDM'06).

[24]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.