Using count prediction techniques for mining frequent patterns in transactional data streams

We study the problem of mining frequent itemsets in dynamic data streams and consider the issue of concept drift. A count-prediction based algorithm is proposed, which estimates the counts of itemsets by predictive models to find frequent itemsets out. The predictive models are constructed based on the data in the data stream and serve as a description of the concept of the stream. If there is a concept drift in the stream, the description of the concept can be updated by reconstructing the predictive models. According to our experimental results, the proposed algorithm is efficient and has stable performance. Besides, using respective predictive models for count-predictive mining would preserve the quality of mining answers effectively (in terms of accuracy) against the change of the concept.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[3]  Kuen-Fang Jea,et al.  An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams , 2011, Expert Syst. Appl..

[4]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[5]  Wonsuk Lee,et al.  Finding maximal frequent itemsets over online data streams adaptively , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[6]  Hongjun Lu,et al.  A false negative approach to mining frequent itemsets from high speed transactional data streams , 2006, Inf. Sci..

[7]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[8]  Ruoming Jin,et al.  An algorithm for in-core frequent itemset mining on streaming data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[9]  Ludmila I. Kuncheva,et al.  A framework for generating data to simulate changing environments , 2007, Artificial Intelligence and Applications.

[10]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[11]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[12]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[13]  Kuen-Fang Jea,et al.  Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach , 2009, Expert Syst. Appl..

[14]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[15]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[16]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[17]  Haixun Wang,et al.  On reducing classifier granularity in mining concept-drifting data streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[19]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[20]  Frank Kirchner,et al.  Performance evaluation of EANT in the robocup keepaway benchmark , 2007, ICMLA 2007.

[21]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, Comb..

[22]  Raj Bhatnagar,et al.  Tracking recurrent concept drift in streaming data using ensemble classifiers , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).