A change detector for mining frequent patterns over evolving data streams

Mining data streams for frequent patterns is important in many applications. Unlike traditional static databases, the underlying process that generates the data streams evolves over time. Past data may become outdated and of little use when compared to the most recent one. When a significant change occurs, much harm is done to the mining result if it is not properly handled. In this paper, an online algorithm for change detection in frequent pattern mining is proposed. Although there have been several studies mainly for adapting to changes, we contend that this is not enough. The ability to detect and characterize change is essential in many applications. A novel test strategy is designed to gather the ldquoevidencerdquo sufficient to conclude on whether the current sample differ significantly from a reference sample. Different statistical tests are evaluated and our study shows that the chi-square test is the most suitable for enumerated or count data.

[1]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[2]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[3]  Manoranjan Dash,et al.  Efficient Reservoir Sampling for Transactional Data Streams , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  A. Winsor Sampling techniques. , 2000, Nursing times.

[5]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[6]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[7]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[8]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Marcello Pagano,et al.  Principles of Biostatistics , 1992 .

[10]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[11]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[12]  Hervé Brönnimann,et al.  Deterministic Data Reduction in Sensor Networks , 2006, 2006 IEEE International Conference on Mobile Ad Hoc and Sensor Systems.

[13]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[14]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Bin Chen,et al.  Efficient data reduction with EASE , 2003, KDD '03.