StreamKrimp: Detecting Change in Data Streams

Data streams are ubiquitous. Examples range from sensor networks to financial transactions and website logs. In fact, even market basket data can be seen as a stream of sales. Detecting changes in the distribution a stream is sampled from is one of the most challenging problems in stream mining, as only limited storage can be used. In this paper we analyse this problem for streams of transaction data from an MDL perspective. Based on this analysis we introduce the StreamKrimp algorithm, whichuses the Krimp algorithm to characterise probability distributions with code tables. With these code tables, StreamKrimp partitions the stream into a sequence of substreams. Each switch of code table indicates a change in the underlying distribution. Experiments on both real and artificial streams show that StreamKrimp detects the changes while using only a very limited amount of data storage.

[1]  Charu C. Aggarwal,et al.  On Abnormality Detection in Spuriously Populated Data Streams , 2005, SDM.

[2]  S. Venkatasubramanian,et al.  An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams , 2006 .

[3]  Charu C. Aggarwal,et al.  Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .

[4]  Christos Faloutsos,et al.  Adaptive, unsupervised stream mining , 2004, The VLDB Journal.

[5]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[6]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[7]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[8]  Keke Chen,et al.  Detecting the Change of Clustering Structure in Categorical Data Streams , 2006, SDM.

[9]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[10]  K. Vanhoof,et al.  Profiling of High-Frequency Accident Locations by Use of Association Rules , 2003 .

[11]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[12]  Simon Parsons Advances in minimum description length by Jae Myung and Mark A. Pitt, edited by Peter D. Grünwald, MIT Press, 444 pp, ISBN 0-262-07262-9 , 2006, Knowl. Eng. Rev..

[13]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[14]  Jilles Vreeken,et al.  Preserving Privacy through Data Generation , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[16]  S. Muthukrishnan,et al.  Sequential Change Detection on Data Streams , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[17]  Jilles Vreeken,et al.  Characterising the difference , 2007, KDD '07.