Summarizing Cluster Evolution in Dynamic Environments

Monitoring and interpretation of changing patterns is a task of paramount importance for data mining applications in dynamic environments. While there is much research in adapting patterns in the presence of drift or shift, there is less research on how to maintain an overview of pattern changes over time. A major challenge lays in summarizing changes in an effective way, so that the nature of change can be understood by the user, while the demand on resources remains low. To this end, we propose FINGERPRINT, an environment for the summarization of cluster evolution. Cluster changes are captured into an "evolution graph", which is then summarized based on cluster similarity into a fingerprint of evolution by merging similar clusters. We propose a batch summarization method that traverses and summarizes the Evolution Graph as a whole, and an incremental method that is applied during the process of cluster transition discovery. We present experiments on different data streams and discuss the space reduction and information preservation achieved by the two methods.

[1]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[2]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[3]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[4]  Chengyang Zhang,et al.  Advances in Spatial and Temporal Databases , 2015, Lecture Notes in Computer Science.

[5]  Srinivasan Parthasarathy,et al.  A generalized framework for mining spatio-temporal patterns in scientific data , 2005, KDD '05.

[6]  Vipin Kumar,et al.  Summarization - compressing data into an informative representation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[8]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[9]  Luis Gravano,et al.  Modeling and managing content changes in text databases , 2005, 21st International Conference on Data Engineering (ICDE'05).

[10]  Philip S. Yu,et al.  A Framework for Clustering Massive Text and Categorical Data Streams , 2006, SDM.

[11]  Dino Pedreschi,et al.  Knowledge Discovery in Databases: PKDD 2004 , 2004, Lecture Notes in Computer Science.

[12]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[14]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[15]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[16]  Yannis Theodoridis,et al.  A Unified and Flexible Framework for Comparing Simple and Complex Patterns , 2004, PKDD.

[17]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[18]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.