Semi-parametric modeling of excesses above high multivariate thresholds with censored data

How to include censored data in a statistical analysis is a recurrent issue in statistics. In multivariate extremes, the dependence structure of large observations can be characterized in terms of a non parametric angular measure, while marginal excesses above asymptotically large thresholds have a parametric distribution. In this work, a flexible semi-parametric Dirichlet mixture model for angular measures is adapted to the context of censored data and missing components. One major issue is to take into account censoring intervals overlapping the extremal threshold, without knowing whether the corresponding hidden data is actually extreme. Further, the censored likelihood needed for Bayesian inference has no analytic expression. The first issue is tackled using a Poisson process model for extremes, whereas a data augmentation scheme avoids multivariate integration of the Poisson process intensity over both the censored intervals and the failure region above threshold. The implemented MCMC algorithm allows simultaneous estimation of marginal and dependence parameters, so that all sources of uncertainty other than model bias are captured by posterior credible intervals. The method is illustrated on simulated and real data.

[1]  Simon Guillotte,et al.  Non‐parametric Bayesian inference on bivariate extremes , 2009, 0911.3270.

[2]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[3]  Malcolm R Leadbetter,et al.  Extremes and local dependence in stationary sequences , 1983 .

[4]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .

[5]  Wendelin Schnedler,et al.  Likelihood Estimation for Censored Random Vectors , 2005 .

[6]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .

[7]  Gerda Claeskens,et al.  Nonparametric Estimation , 2011, International Encyclopedia of Statistical Science.

[8]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[9]  S. Resnick Heavy-Tail Phenomena: Probabilistic and Statistical Modeling , 2006 .

[10]  Richard L. Smith,et al.  Models for exceedances over high thresholds , 1990 .

[11]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[12]  J. R. Wallis,et al.  Regional Frequency Analysis: An Approach Based on L-Moments , 1997 .

[13]  L. Haan,et al.  Nonparametric estimation of the spectral measure of an extreme value distribution , 2001 .

[14]  Richard L. Smith,et al.  Multivariate Threshold Methods , 1994 .

[15]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[16]  A. Davison,et al.  A comparative study of likelihood estimators for multivariate extremes , 2014 .

[17]  Philip Heidelberger,et al.  Simulation Run Length Control in the Presence of an Initial Transient , 1983, Oper. Res..

[18]  A. SABOURIN,et al.  Bayesian Dirichlet mixture model for multivariate extremes: A re-parametrization , 2014, Comput. Stat. Data Anal..

[19]  S. Coles,et al.  Modelling Extreme Multivariate Events , 1991 .

[20]  Philippe Naveau,et al.  A note of caution when interpreting parameters of the distribution of excesses , 2011 .

[21]  A. Stephenson Simulating Multivariate Extreme Value Distributions of Logistic Type , 2003 .

[22]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[23]  Lei Si Ni Ke Resnick.S.I. Extreme values. regular variation. and point processes , 2011 .

[24]  A. Stephenson HIGH‐DIMENSIONAL PARAMETRIC MODELLING OF MULTIVARIATE EXTREME EVENTS , 2009 .

[25]  Thomas Opitz,et al.  Efficient inference and simulation for elliptical Pareto processes , 2013, 1401.0168.

[26]  J. Segers,et al.  Maximum Empirical Likelihood Estimation of the Spectral Measure of an Extreme Value Distribution , 2008, 0812.3485.

[27]  Richard L. Smith,et al.  Markov chain models for threshold exceedances , 1997 .

[28]  B. Renard,et al.  Combining regional estimation and historical floods: A multivariate semiparametric peaks‐over‐threshold model with censored data , 2014, 1411.7782.

[29]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[30]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[31]  J. Teugels,et al.  Statistics of Extremes , 2004 .

[32]  S. Resnick Extreme Values, Regular Variation, and Point Processes , 1987 .

[33]  Anthony C. Davison,et al.  A mixture model for multivariate extremes , 2007 .

[34]  Michel Lang,et al.  Flood frequency analysis using historical data: accounting for random and systematic errors , 2010 .

[35]  J. Nolan,et al.  Models for Dependent Extremes Using Stable Mixtures , 2007, 0711.2345.

[36]  Xiao-Li Meng,et al.  The Art of Data Augmentation , 2001 .

[37]  A. Ledford,et al.  Statistics for near independence in multivariate extreme values , 1996 .

[38]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[39]  A. O'Hagan,et al.  The Calculation of Posterior Distributions by Data Augmentation: Comment , 1987 .

[40]  Katherine Campbell,et al.  Flood Frequency Analysis , 2001, Technometrics.

[41]  Guadalupe Gómez,et al.  Frequentist and Bayesian approaches for interval-censored data , 2004 .