Feature Clustering for Extreme Events Analysis, with Application to Extreme Stream-Flow Data

The dependence structure of extreme events of multivariate nature plays a special role for risk management applications, in particular in hydrology (flood risk). In a high dimensional context (\(d>50\)), a natural first step is dimension reduction. Analyzing the tails of a dataset requires specific approaches: earlier works have proposed a definition of sparsity adapted for extremes, together with an algorithm detecting such a pattern under strong sparsity assumptions. Given a dataset that exhibits no clear sparsity pattern we propose a clustering algorithm allowing to group together the features that are ‘dependent at extreme level’, i.e.,that are likely to take extreme values simultaneously. To bypass the computational issues that arise when it comes to dealing with possibly \(O(2^d)\) subsets of features, our algorithm exploits the graphical structure stemming from the definition of the clusters, similarly to the Apriori algorithm, which reduces drastically the number of subsets to be screened. Results on simulated and real data show that our method allows a fast recovery of a meaningful summary of the dependence structure of extremes.

[1]  Daniel Cooley,et al.  The pairwise beta distribution: A flexible parametric multivariate model for extremes , 2010, J. Multivar. Anal..

[2]  A. Stephenson Simulating Multivariate Extreme Value Distributions of Logistic Type , 2003 .

[3]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[4]  Lei Si Ni Ke Resnick.S.I. Extreme values. regular variation. and point processes , 2011 .

[5]  J. Tawn Modelling multivariate extreme value distributions , 1990 .

[6]  Hyoungjoo Lee,et al.  On-line novelty detection using the Kalman filter and extreme value theory , 2008, 2008 19th International Conference on Pattern Recognition.

[7]  J. Teugels,et al.  Statistics of Extremes , 2004 .

[8]  Jean-Philippe Vidal,et al.  Low Flows in France and their relationship to large scale climate indices , 2013 .

[9]  Simon Guillotte,et al.  Non‐parametric Bayesian inference on bivariate extremes , 2009, 0911.3270.

[10]  J. Segers,et al.  Maximum Empirical Likelihood Estimation of the Spectral Measure of an Extreme Value Distribution , 2008, 0812.3485.

[11]  A. SABOURIN,et al.  Bayesian Dirichlet mixture model for multivariate extremes: A re-parametrization , 2014, Comput. Stat. Data Anal..

[12]  S. Coles,et al.  Modelling Extreme Multivariate Events , 1991 .

[13]  E. Chautru Dimension reduction in multivariate extreme value analysis , 2015 .

[14]  Y. Qi Almost sure convergence of the stable tail empirical dependence function in multivariate extreme statistics , 1997 .

[15]  John P. Nolan,et al.  Dense classes of multivariate extreme value distributions , 2013, J. Multivar. Anal..

[16]  David A. Clifton,et al.  Novelty Detection with Multivariate Extreme Value Statistics , 2011, J. Signal Process. Syst..

[17]  P. Naveau,et al.  Bayesian model averaging for multivariate extremes , 2013 .

[18]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[19]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[20]  M. Parlange,et al.  Statistics of extremes in hydrology , 2002 .

[21]  Anthony C. Davison,et al.  A mixture model for multivariate extremes , 2007 .

[22]  H. Mannila,et al.  Discovering all most specific sentences , 2003, TODS.

[23]  Anne Sabourin,et al.  Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking , 2016, AISTATS.

[24]  Philip S. Yu,et al.  Max-Clique: A Top-Down Graph-Based Approach to Frequent Pattern Mining , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[26]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[27]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[28]  Anne Sabourin,et al.  Sparsity in Multivariate Extremes with Applications to Anomaly Detection , 2015, 1507.05899.

[29]  L. Haan,et al.  Bias correction in multivariate extremes , 2015, 1504.00490.

[30]  Anne Sabourin,et al.  Learning the dependence structure of rare events: a non-asymptotic study , 2015, COLT.