Automatic Detection of Cluster Structure Changes using Relative Density Self-Organizing Maps

Knowledge of clustering changes in real-life datasets is important in many contexts, such as customer attrition analysis and fraud detection. Organizations can use such knowledge of change to adapt business strategies in response to changing circumstances. Analysts should be able to relate new knowledge acquired from a newer dataset to that acquired from an earlier dataset to understand what has changed. There are two kind of clustering changes, which are: changes in clustering structure and changes in cluster memberships. The key contribution of this paper is a novel method to automatically detect structural changes in two snapshot datasets using ReDSOM. The method identifies emerging clusters, disappearing clusters, splitting clusters, merging clusters, enlarging clusters, and shrinking clusters. Evaluation using synthetic datasets demonstrates that this method can identify automatically structural cluster changes. Moreover, the changes identified in our evaluation using real-life datasets from the World Bank can be related to actual changes.

[1]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[2]  John F. Roddick,et al.  An Updated Bibliography of Temporal, Spatial, and Spatio-temporal Data Mining Research , 2000, TSDM.

[3]  Gediminas Adomavicius,et al.  C-TREND: Temporal Cluster Graphs for Identifying and Visualizing Trends in Multiattribute Transactional Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  João Gama,et al.  Bipartite Graphs for Monitoring Clusters Transitions , 2010, IDA.

[5]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  André Skupin,et al.  Visualizing Demographic Trajectories with Self-Organizing Maps , 2005, GeoInformatica.

[7]  Rudolf Kruse,et al.  Analysis and Visualization of Dynamic Clusterings , 2013, 2013 46th Hawaii International Conference on System Sciences.

[8]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[9]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[10]  Graham J. Williams,et al.  Visualizing temporal cluster changes using Relative Density Self-Organizing Maps , 2009, Knowledge and Information Systems.

[11]  Myra Spiliopoulou,et al.  Tracing cluster transitions for different cluster types , 2009, Control. Cybern..

[12]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[14]  Masayuki Numao,et al.  Sequence-based SOM: Visualizing transition of dynamic clusters , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[15]  Daniel A. Keim,et al.  A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[16]  Denny,et al.  Visualization of Cluster Changes by Comparing Self-organizing Maps , 2005, PAKDD.

[17]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[18]  P. Diggle Analysis of Longitudinal Data , 1995 .

[19]  Johannes Gehrke,et al.  A Framework for Measuring Differences in Data Characteristics , 2002, J. Comput. Syst. Sci..

[20]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[21]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[22]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[23]  Yannis Theodoridis,et al.  The Panda framework for Comparing Patterns , 2009, Data Knowl. Eng..

[24]  Mirko Böttcher,et al.  Contrast and change mining , 2011, WIREs Data Mining Knowl. Discov..

[25]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[26]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .