Comparing Predictive Power in Climate Data: Clustering Matters

Various clustering methods have been applied to climate, ecological, and other environmental datasets, for example to define climate zones, automate land-use classification, and similar tasks. Measuring the "goodness" of such clusters is generally application-dependent and highly subjective, often requiring domain expertise and/or validation with field data (which can be costly or even impossible to acquire). Here we focus on one particular task: the extraction of ocean climate indices from observed climatological data. In this case, it is possible to quantify the relative performance of different methods. Specifically, we propose to extract indices with complex networks constructed from climate data, which have been shown to effectively capture the dynamical behavior of the global climate system, and compare their predictive power to candidate indices obtained using other popular clustering methods. Our results demonstrate that network-based clusters are statistically significantly better predictors of land climate than any other clustering method, which could lead to a deeper understanding of climate processes and complement physics-based climate models.

[1]  Nitesh V. Chawla,et al.  Complex Networks In Climate Science: Progress, Opportunities And Challenges , 2010, CIDU.

[2]  Paul J. Roebber,et al.  The architecture of the climate network , 2004 .

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Gabriella Tarantello,et al.  Subharmonic solutions with prescribed minimal period for nonautonomous Hamiltonian systems , 1988 .

[5]  Vipin Kumar,et al.  A Knowledge Discovery Strategy for Relating Sea Surface Temperatures to Frequencies of Tropical Storms and Generating Predictions of Hurricanes Under 21st-century Global Warming Scenarios , 2010, CIDU.

[6]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[7]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[8]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[9]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[10]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[11]  Vipin Kumar,et al.  Discovery of climate indices using clustering , 2003, KDD '03.

[12]  F. Clarke,et al.  Nonlinear oscillations and boundary value problems for Hamiltonian systems , 1982 .

[13]  R. Reynolds,et al.  The NCEP/NCAR 40-Year Reanalysis Project , 1996, Renewable Energy.

[14]  Paul H. Rabinowitz,et al.  On subharmonic solutions of hamiltonian systems , 1980 .

[15]  R. Fovell,et al.  Climate zones of the conterminous United States defined using cluster analysis , 1993 .

[16]  R. Guimerà,et al.  The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Havlin,et al.  Climate networks around the globe are significantly affected by El Niño. , 2008, Physical review letters.

[18]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  R. Katz,et al.  Teleconnections linking worldwide climate anomalies : scientific basis and societal impact , 1991 .

[21]  Nitesh V. Chawla,et al.  An exploration of climate data using complex networks , 2009, SensorKDD '09.

[22]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Gabriella Tarantello Subharmonic solutions for hamiltonian systems via a $\mathbb {Z}_p$ pseudoindex theory , 1988 .

[24]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[25]  Nitesh V. Chawla,et al.  Complex networks as a unified framework for descriptive analysis and predictive modeling in climate science , 2011, Stat. Anal. Data Min..

[26]  P. Jones,et al.  An Extension of the TahitiDarwin Southern Oscillation Index , 1987 .

[27]  Limin Yang,et al.  Development of a global land cover characteristics database and IGBP DISCover from 1 km AVHRR data , 2000 .

[28]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[29]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[30]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[31]  M. Barthelemy,et al.  Microdynamics in stationary complex networks , 2008, Proceedings of the National Academy of Sciences.

[32]  Potsdam,et al.  Complex networks in climate dynamics. Comparing linear and nonlinear network construction methods , 2009, 0907.4359.

[33]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[34]  William W. Hargrove,et al.  Using multivariate clustering to characterize ecoregion borders , 1999, Comput. Sci. Eng..

[35]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[36]  Paul J. Roebber,et al.  What Do Networks Have to Do with Climate , 2006 .

[37]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[38]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[39]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.