Predictive Analytics for Extreme Events in Big Data

This paper presents an efficient computational methodology for longitudinal and cross-sectional analysis of extreme event statistics in large data sets. The analyzed data are available across multiple time periods and multiple individuals in a population. Some of the periods and individuals might have no extreme events and some might have much data. The extreme events are modeled with a Pareto or exponential tail distribution. The proposed approach to longitudinal and cross-sectional analysis of the tail models is based on non-parametric Bayesian formulation. The maximum a posteriori probability problem leads to two convex problems for the tail parameters. Solving one problem yields the trends for the tail decay rate across the population and time periods. Solving another gives the trends of the tail quintile level. The approach is illustrated by providing analysis of 10-and 100-year extreme event risks for extreme climate events and for peak power loads in electrical utility data.

[1]  L. Haan,et al.  Extreme value theory : an introduction , 2006 .

[2]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[3]  Saahil Shenoy,et al.  Estimating Long Tail Models for Risk Trends , 2015, IEEE Signal Processing Letters.

[4]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[5]  Sylvain Sardy,et al.  Extreme-Quantile Tracking for Financial Time Series , 2014 .

[6]  David A. Clifton,et al.  An Extreme Function Theory for Novelty Detection , 2013, IEEE Journal of Selected Topics in Signal Processing.

[7]  Michael Muma,et al.  Robust Estimation in Signal Processing: A Tutorial-Style Treatment of Fundamental Concepts , 2012, IEEE Signal Processing Magazine.

[8]  Thomas S. Shively,et al.  Point process approach to modeling trends in tropospheric ozone based on exceedances of a high threshold , 1995 .

[9]  Saahil Shenoy,et al.  Gaussian-Laplacian mixture model for electricity market , 2014, 53rd IEEE Conference on Decision and Control.

[10]  Philip Jonathan,et al.  Threshold modelling of spatially dependent non‐stationary extremes with application to hurricane‐induced wave heights , 2011 .

[11]  Saahil Shenoy,et al.  Risk adjusted forecasting of electric power load , 2014, 2014 American Control Conference.

[12]  Richard L. Smith Extreme Value Analysis of Environmental Time Series: An Application to Trend Detection in Ground-Level Ozone , 1989 .

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Erik Vanem,et al.  Long-term time-dependent stochastic modelling of extreme waves , 2011 .

[15]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[16]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .