Tracing Evolving Subspace Clusters in Temporal Climate Data

Analysis of temporal climate data is an active research area. Advanced data mining methods designed especially for these temporal data support the domain expert’s pursuit to understand phenomena as the climate change, which is crucial for a sustainable world. Important solutions for mining temporal data are cluster tracing approaches, which are used to mine temporal evolutions of clusters. Generally, clusters represent groups of objects with similar values. In a temporal context like tracing, similar values correspond to similar behavior in one snapshot in time. Each cluster can be interpreted as a behavior type and cluster tracing corresponds to tracking similar behaviors over time. Existing tracing approaches are for datasets satisfying two specific conditions: The clusters appear in all attributes, i.e., fullspace clusters, and the data objects have unique identifiers. These identifiers are used for tracking clusters by measuring the number of objects two clusters have in common, i.e. clusters are traced based on similar object sets. These conditions, however, are strict: First, in complex data, clusters are often hidden in individual subsets of the dimensions. Second, mapping clusters based on similar objects sets does not reflect the idea of tracing similar behavior types over time, because similar behavior can even be represented by clusters having no objects in common. A tracing method based on similar object values is needed. In this paper, we introduce a novel approach that traces subspace clusters based on object value similarity. Neither subspace tracing nor tracing by object value similarity has been done before.

[1]  Vipin Kumar,et al.  Land cover change detection: a case study , 2008, KDD.

[2]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[3]  O. Hoegh‐Guldberg Climate change, coral bleaching and the future of the world's coral reefs , 1999 .

[4]  Claudia E. Mills,et al.  Evidence for a substantial increase in gelatinous zooplankton in the Bering Sea, with possible links to climate change , 1999 .

[5]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[6]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[7]  T. Huntington Evidence for intensification of the global water cycle: Review and synthesis , 2006 .

[8]  Thomas Seidl,et al.  Subspace Clustering for Uncertain Data , 2010, SDM.

[9]  T. Barnett,et al.  Detection of Anthropogenic Climate Change in the World's Oceans , 2001, Science.

[10]  Myra Spiliopoulou,et al.  MONIC: modeling and monitoring cluster transitions , 2006, KDD '06.

[11]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[12]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[13]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[14]  Arthur Zimek,et al.  Clustering High-Dimensional Data , 2018, Data Clustering: Algorithms and Applications.

[15]  Yifan Li,et al.  Clustering moving objects , 2004, KDD.

[16]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[17]  A. Longhurst Ecological Geography of the Sea , 1998 .

[18]  Charu C. Aggarwal,et al.  On change diagnosis in evolving data streams , 2005, IEEE Transactions on Knowledge and Data Engineering.

[19]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[20]  W. Hargrove,et al.  Using Clustered Climate Regimes to Analyze and Compare Predictions from Fully Coupled General Circulation Models , 2005 .

[21]  Hongyuan Zha,et al.  A new Mallows distance based metric for comparing clusterings , 2005, ICML '05.

[22]  Kanad Ghose,et al.  Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[25]  Beng Chin Ooi,et al.  Continuous Clustering of Moving Objects , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Man Lung Yiu,et al.  Frequent-pattern based iterative projected clustering , 2003, Third IEEE International Conference on Data Mining.

[27]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[28]  Hans-Peter Kriegel,et al.  Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering , 2008, Proc. VLDB Endow..

[29]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[30]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[31]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[32]  D. A. Siegel,et al.  The North Atlantic Spring Phytoplankton Bloom and Sverdrup's Critical Depth Hypothesis , 2002, Science.

[33]  Thomas Seidl,et al.  An effective evaluation measure for clustering on evolving data streams , 2011, KDD.

[34]  Marina Meila,et al.  Comparing subspace clusterings , 2006, IEEE Transactions on Knowledge and Data Engineering.

[35]  Thomas Seidl,et al.  Detecting Climate Change in Multivariate Time Series Data by Novel Clustering and Cluster Tracing Techniques , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[36]  Myra Spiliopoulou,et al.  On exploiting the power of time in data mining , 2008, SKDD.

[37]  Panos Kalnis,et al.  On Discovering Moving Clusters in Spatio-temporal Data , 2005, SSTD.

[38]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..