Adaptive evolutionary clustering

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naïve estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.

[1]  George Michailidis,et al.  Discovering the Ecosystem of an Electronic Financial Market with a Dynamic Machine-Learning Method , 2011 .

[2]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[3]  Craig W. Reynolds Flocks, herds, and schools: a distributed behavioral model , 1987, SIGGRAPH.

[4]  Kanad Ghose,et al.  Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[5]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[6]  Przemyslaw Kazienko,et al.  GED: the method for group evolution discovery in social networks , 2012, Social Network Analysis and Mining.

[7]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Bernhard Schölkopf,et al.  A Kernel Approach to Comparing Distributions , 2007, AAAI.

[9]  S. Haykin Kalman Filtering and Neural Networks , 2001 .

[10]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[11]  Yun Chi,et al.  Analyzing communities and their evolutions in dynamic social networks , 2009, TKDD.

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[14]  M. Shahriar Hossain,et al.  Unifying dependent clustering and disparate clustering for non-homogeneous data , 2010, KDD.

[15]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[16]  P. Maher,et al.  Handbook of Matrices , 1999, The Mathematical Gazette.

[17]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[18]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[19]  Yun Chi,et al.  On evolutionary spectral clustering , 2009, TKDD.

[20]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[21]  Robert L. Grossman,et al.  GenIc: A Single-Pass Generalized Incremental Algorithm for Clustering , 2004, SDM.

[22]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[23]  Xiang Ji,et al.  Document clustering with prior knowledge , 2006, SIGIR.

[24]  Yangqiu Song,et al.  On-line evolutionary exponential family mixture , 2009, IJCAI 2009.

[25]  Lizhu Zhou,et al.  Mining Naturally Smooth Evolution of Clusters from Dynamic Data , 2007, SDM.

[26]  Alfred O. Hero,et al.  Evolutionary spectral clustering with adaptive forgetting factor , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[28]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[29]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[30]  Eric P. Xing,et al.  Dynamic Non-Parametric Mixture Models and the Recurrent Chinese Restaurant Process: with Applications to Evolutionary Clustering , 2008, SDM.

[31]  Philip S. Yu,et al.  Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[32]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[33]  Yihong Gong,et al.  Detecting communities and their evolutions in dynamic social networks—a Bayesian approach , 2011, Machine Learning.

[34]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[35]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[36]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[37]  Yihong Gong,et al.  Incremental spectral clustering by efficiently updating the eigen-system , 2010, Pattern Recognit..

[38]  Yifan Li,et al.  Clustering moving objects , 2004, KDD.

[39]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[40]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[41]  Nick S. Jones,et al.  Dynamic communities in multichannel data: an application to the foreign exchange market during the 2007-2008 credit crisis. , 2008, Chaos.

[42]  Derek Greene,et al.  Tracking the Evolution of Communities in Dynamic Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[43]  Philip S. Yu,et al.  Dirichlet Process Based Evolutionary Clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[44]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[45]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[46]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[47]  Alfred O. Hero,et al.  Shrinkage Algorithms for MMSE Covariance Estimation , 2009, IEEE Transactions on Signal Processing.

[48]  Myra Spiliopoulou,et al.  Mining and Visualizing the Evolution of Subgroups in Social Networks , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[49]  Jukka-Pekka Onnela,et al.  Community Structure in Time-Dependent, Multiscale, and Multiplex Networks , 2009, Science.

[50]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[51]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[53]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[54]  Naren Ramakrishnan,et al.  Simultaneously Segmenting Multiple Gene Expression Time Courses by Analyzing Cluster Dynamics , 2009, APBC.

[55]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[56]  Simon J. Godsill,et al.  The Gaussian mixture MCMC particle algorithm for dynamic cluster tracking , 2009, 2009 12th International Conference on Information Fusion.

[57]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[58]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[59]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.