Clustering seasonal time series using extreme value analysis: An application to Spanish temperature time series

ABSTRACT A challenging aspect of grouping together regional temperature time series is that some regions have similar summer temperatures but different winter temperatures and vice versa. We explore this by applying cluster analysis to regional temperature time series in Spain using as features the parameter estimates of location, scale, and shape, obtained from fitting the generalized extreme value (GEV) distribution to the block maxima and block minima of the series. Using this approach, our findings reveal that the identified clusters can be meaningfully interpreted and are well validated. The motivation for using this approach is that each time series is represented by just three easily extracted features. If features were to be extracted as a result of conventional time series modeling, they are likely to be impacted upon by the uncertainty of model selection. This is not the case with GEV modeling. Furthermore, GEV modeling enables long – term projections of the maxima and minima that cannot otherwise be achieved from conventional time series modeling. For comparison purposes, we also explore clustering the block maxima and block minima of the times series. In addition, we explore the performance of this approach using simulated data.

[1]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[2]  R. Trigo,et al.  Extreme summer temperatures in Iberia: health impacts and associated synoptic conditions , 2005 .

[3]  Andrés M. Alonso,et al.  Comparing generalized Pareto models fitted to extreme observations: an application to the largest temperatures in Spain , 2014, Stochastic Environmental Research and Risk Assessment.

[4]  Manola Brunet,et al.  Temporal and spatial temperature variability and change over Spain during 1850-2005 , 2007 .

[5]  Malika Charrad,et al.  NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set , 2014 .

[6]  Michael N. Tsimplis,et al.  Extreme sea-level distribution and return periods in the Aegean and Ionian Seas , 1997 .

[7]  R. Reiss,et al.  Statistical Analysis of Extreme Values-with applications to insurance , 1997 .

[8]  Martin T. Hagan,et al.  Neural network design , 1995 .

[9]  V. Meneu,et al.  Analysis of extreme temperatures for four sites across Peninsular Spain , 2011 .

[10]  Andrés M. Alonso,et al.  Clustering Time Series of Sea Levels: Extreme Value Approach , 2010 .

[11]  Andrés M. Alonso,et al.  Extreme value and cluster analysis of European daily temperature series , 2011 .

[12]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[13]  F. Rodrigo,et al.  Trends in seasonal indices of daily temperature extremes in the Iberian Peninsula, 1929–2005 , 2012 .

[14]  Daniel Peña,et al.  Bayesian analysis of dynamic factor models: an application to air pollution and mortality in São Paulo, Brazil , 2008 .

[15]  C. Guedes Soares,et al.  Application of the r largest-order statistics for long-term predictions of significant wave height , 2004 .

[16]  Inigo J. Losada,et al.  Analyzing monthly extreme sea levels with a time-dependent GEV model , 2007 .

[17]  D. Sundar,et al.  Analysis of extreme sea level along the east coast of India , 2004 .

[18]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[19]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .