Effective and Robust Mining of Temporal Subspace Clusters

Mining temporal multivariate data by clustering is an important research topic. In today's complex data, interesting patterns are often neither bound to the whole dimensional nor temporal extent of the data domain. This challenge is met by temporal subspace clustering methods. Their effectiveness, however, is impeded by aspects unavoidable in real world data: Misalignments between time series, for example caused by out-of-sync sensors, and measurement errors. Under these conditions, existing temporal subspace clustering approaches miss the patterns contained in the data. In this paper, we propose a novel clustering method that mines temporal subspace clusters reflected by sets of objects and relevant intervals. We enable flexible handling of misaligned time series by adaptively shifting time series in the time domain, and we achieve robustness to measurement errors by allowing certain fractions of deviating values in each relevant point in time. We show the effectiveness of our method in experiments on real and synthetic data.

[1]  Thomas Seidl,et al.  An effective evaluation measure for clustering on evolving data streams , 2011, KDD.

[2]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[3]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[4]  Raj Bhatnagar,et al.  An effective algorithm for mining 3-clusters in vertically partitioned data , 2008, CIKM '08.

[5]  Liang Wang,et al.  Structure-Based Statistical Features and Multivariate Time Series Clustering , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[6]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[7]  Deborah F. Swayne,et al.  Grouping Multivariate Time Series : A Case Study , 2006 .

[8]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[9]  Thomas Seidl,et al.  Mining of Temporal Coherent Subspace Clusters in Multivariate Time Series Databases , 2012, PAKDD.

[10]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[11]  Edmond H. C. Wu,et al.  Independent Component Analysis for Clustering Multivariate Time Series Data , 2005, ADMA.

[12]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[13]  Ira Assent,et al.  External evaluation measures for subspace clustering , 2011, CIKM '11.

[14]  FuTak-chung A review on time series data mining , 2011 .

[15]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[16]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[17]  Kelvin Sim,et al.  Discovering Correlated Subspace Clusters in 3D Continuous-Valued Data , 2010, 2010 IEEE International Conference on Data Mining.

[18]  Yuji Matsumoto,et al.  An Application of Boosting to Graph Classification , 2004, NIPS.

[19]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[20]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[21]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[22]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[23]  Tim Oates,et al.  Identifying distinctive subsequences in multivariate time series by clustering , 1999, KDD '99.

[24]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[25]  Man Lung Yiu,et al.  Frequent-pattern based iterative projected clustering , 2003, Third IEEE International Conference on Data Mining.

[26]  Ira Assent,et al.  SubClass: Classification of Multidimensional Noisy Data Using Subspace Clusters , 2008, PAKDD.

[27]  Jian Pei,et al.  Mining coherent gene clusters from gene-sample-time microarray data , 2004, KDD.

[28]  Zhen Hu,et al.  Algorithm for Discovering Low-Variance 3-Clusters from Real-Valued Datasets , 2010, 2010 IEEE International Conference on Data Mining.