Mining Semantic Time Period Similarity in Spatio-Temporal Climate Data

Over the last decade, advances in high performance computing and remote sensing have produced a vast amount of spatio-temporal data. One area that this data explosion is most prevalent is climate science. With this in mind, there is an increasing need to characterize large spatio-temporal datasets. One such characterization is to find periods of time that exhibit the same spatio-temporal pattern. The focus of this research is to find similar spatio-temporal patterns for semantic time periods. A semantic time period could be any arbitrary division in time such as year, month, or week. The proposed approach first characterizes the data spatially by using one of three approaches including local entropy, local spatial autocorrelation, and local distance-based outliers, to identify interesting spatial features in the dataset. Then, a location/time period matrix which is analogous to a term/document matrix in natural language processing is created to capture the spatial features for a given semantic time period. This matrix contains a count of for each spatial location, the number of times that it is a feature of interest during a semantic time period. Then using latent semantic analysis, the cosine similarity for each semantic time period is calculated. The results are then clustered using affinity propagation. The results show that the similarity matrix produced by distance-based outliers creates the best clustering. The approach is demonstrated on a modeled global climate dataset where we clustered years from 1948 to 2012.

[1]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[2]  Mihai Datcu,et al.  Modeling trajectory of dynamic clusters in image time-series for spatio-temporal reasoning , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[4]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[5]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[6]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[7]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[8]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[9]  Roy George,et al.  Fuzzy Cluster Analysis of Spatio-Temporal Data , 2003, ISCIS.

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Christos Faloutsos,et al.  Efficient and effective Querying by Image Content , 1994, Journal of Intelligent Information Systems.

[14]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[15]  M. Steinbach,et al.  Finding Spatio-Temporal Patterns in Earth Science Data , 2001 .

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[18]  Asanobu Kitamoto Spatio-Temporal Data Mining for Typhoon Image Collection , 2004, Journal of Intelligent Information Systems.

[19]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[20]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[21]  Thomas Seidl,et al.  Tracing Evolving Subspace Clusters in Temporal Climate Data , 2011, Data Mining and Knowledge Discovery.

[22]  W. Hargrove,et al.  Using Clustered Climate Regimes to Analyze and Compare Predictions from Fully Coupled General Circulation Models , 2005 .

[23]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[24]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[25]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[26]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[27]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[28]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[29]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[30]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[31]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Thomas Seidl,et al.  Tracing Evolving Clusters by Subspace and Value Similarity , 2011, PAKDD.

[33]  Curtis E. Dyreson,et al.  Scalable similarity search of timeseries with variable dimensionality , 2011, CIKM '11.

[34]  Yan Huang,et al.  A Framework for Mining Sequential Patterns from Spatio-Temporal Event Data Sets , 2008, IEEE Transactions on Knowledge and Data Engineering.

[35]  Jean-François Mari,et al.  Temporal and spatial data mining with second-order hidden markov models , 2006, Soft Comput..