Discovery of climate indices using clustering

To analyze the effect of the oceans and atmosphere on land climate, Earth Scientists have developed climate indices, which are time series that summarize the behavior of selected regions of the Earth's oceans and atmosphere. In the past, Earth scientists have used observation and, more recently, eigenvalue analysis techniques, such as principal components analysis (PCA) and singular value decomposition (SVD), to discover climate indices. However, eigenvalue techniques are only useful for finding a few of the strongest signals. Furthermore, they impose a condition that all discovered signals must be orthogonal to each other, making it difficult to attach a physical interpretation to them. This paper presents an alternative clustering-based methodology for the discovery of climate indices that overcomes these limitiations and is based on clusters that represent regions with relatively homogeneous behavior. The centroids of these clusters are time series that summarize the behavior of the ocean or atmosphere in those regions. Some of these centroids correspond to known climate indices and provide a validation of our methodology; other centroids are variants of known indices that may provide better predictive power for some land areas; and still other indices may represent potentially new Earth science phenomena. Finally, we show that cluster based indices generally outperform SVD derived indices, both in terms of area weighted correlation and direct correlation with the known indices.

[1]  James C. Tilton,et al.  Image segmentation by region growing and spectral clustering with a natural convergence criterion , 1998, IGARSS '98. Sensing and Managing the Environment. 1998 IEEE International Geoscience and Remote Sensing. Symposium Proceedings. (Cat. No.98CH36174).

[2]  James Demmel,et al.  Applied Numerical Linear Algebra , 1997 .

[3]  W. Hays Statistical theory. , 1968, Annual review of psychology.

[4]  Pang-Ning Tan,et al.  Temporal Data Mining for the Discovery and Analysis of Ocean Climate Indices , 2002 .

[5]  Levent Ertoz,et al.  A New Shared Nearest Neighbor Clustering Algorithm and its Applications , 2002 .

[6]  H. Storch,et al.  Statistical Analysis in Climate Research , 2000 .

[7]  G. Taylor Impacts of the El Ni?o/Southern Oscillation on the Pacific Northwest , 1998 .

[8]  Vipin Kumar,et al.  Finding Topics in Collections of Documents: A Shared Nearest Neighbor Approach , 2003, Clustering and Information Retrieval.

[9]  C. Potter,et al.  Interannual Variability in Terrestrial Net Primary Production: Exploration of Trends and Controls on Regional to Global Scales , 1999, Ecosystems.

[10]  M. Steinbach,et al.  Finding Spatio-Temporal Patterns in Earth Science Data , 2001 .

[11]  B. Goswami,et al.  A dipole mode in the tropical Indian Ocean , 1999, Nature.

[12]  Nicolas Viovy,et al.  Automatic Classification of Time Series (ACTS): A new clustering method for remote sensing time series , 2000 .

[13]  M. Steinbach,et al.  Clustering Earth Science Data: Goals, Issues and Results , 2001 .

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  M. Steinbach,et al.  Data Mining for the Discovery of Ocean Climate Indices , 2002 .

[17]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[18]  Padhraic Smyth,et al.  Multiple Regimes in Northern Hemisphere Height Fields via MixtureModel Clustering* , 1999, Journal of the Atmospheric Sciences.