Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations

Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing information over temporal data. In this paper, we present a temporal data clustering framework via a weighted clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data representations. In our approach, we propose a novel weighted consensus function guided by clustering validation criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then, introduce an agreement function to further reconcile those candidate consensus partitions to a final partition. As a result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint use of different representations, which cuts the information loss in a single representation and exploits various information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis and empirical studies.

[1]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[2]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[3]  Francesca Pennecchi,et al.  The generalized weighted mean of correlated quantities , 2006 .

[4]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[5]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[6]  Kien A. Hua,et al.  Constrained locally weighted clustering , 2008, Proc. VLDB Endow..

[7]  J. Crowley,et al.  CAVIAR Context Aware Vision using Image-based Active Recognition , 2005 .

[8]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[9]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[10]  Ke Chen,et al.  Methods of Combining Multiple Classifiers with Different Features and Their Applications to Text-Independent Speaker Identification , 1997, Int. J. Pattern Recognit. Artif. Intell..

[11]  Vikas Singh,et al.  Ensemble Clustering using Semidefinite Programming , 2007, NIPS.

[12]  Ke Chen A connectionist method for pattern classification with diverse features , 1998, Pattern Recognit. Lett..

[13]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Yun Yang,et al.  Combining Competitive Learning Networks of Various Representations for Sequential Data Clustering , 2007, Trends in Neural Computation.

[15]  Piotr Indyk,et al.  Mining the stock market (extended abstract): which measure is best? , 2000, KDD '00.

[16]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[19]  Shehzad Khalid,et al.  Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space , 2006, Multimedia Systems.

[20]  Padhraic Smyth,et al.  Probabilistic Model-Based Clustering of Multivariate and Sequential Data , 1999 .

[21]  Ke Chen,et al.  Ensemble Learning with Active Data Selection for Semi-Supervised Pattern Classification , 2007, 2007 International Joint Conference on Neural Networks.

[22]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[23]  Avideh Zakhor,et al.  Motion indexing of video , 1997, Proceedings of International Conference on Image Processing.

[24]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[25]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[26]  Robert Pless,et al.  Manifold clustering , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[27]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[28]  KeoghEamonn,et al.  On the Need for Time Series Data Mining Benchmarks , 2003 .

[29]  Dit-Yan Yeung,et al.  Mixtures of ARMA models for model-based time series clustering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[30]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[31]  Carlotta Domeniconi,et al.  Weighted Clustering Ensembles , 2006, SDM.

[32]  Shih-Fu Chang,et al.  Motion trajectory matching of video objects , 1999, Electronic Imaging.

[33]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[34]  Forouzan Golshani,et al.  Motion recovery for video content classification , 1995, TOIS.

[35]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[36]  Dragomir Anguelov,et al.  Mining The Stock Market : Which Measure Is Best ? , 2000 .

[37]  Ke Chen,et al.  On the use of different speech representations for speaker modeling , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[39]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[40]  Ke Chen,et al.  A method of combining multiple probabilistic classifiers through soft competition on different feature sets , 1998, Neurocomputing.