Hybrid data mining-regression for infrastructure risk assessment based on zero-inflated data

Infrastructure disaster risk assessment seeks to estimate the probability of a given customer or area losing service during a disaster, sometimes in conjunction with estimating the duration of each outage. This is often done on the basis of past data about the effects of similar events impacting the same or similar systems. In many situations this past performance data from infrastructure systems is zero-inflated; it has more zeros than can be appropriately modeled with standard probability distributions. The data are also often non-linear and exhibit threshold effects due to the complexities of infrastructure system performance. Standard zero-inflated statistical models such as zero-inflated Poisson and zero-inflated negative binomial regression models do not adequately capture these complexities. In this paper we develop a novel method that is a hybrid classification tree/regression method for complex, zero-inflated data sets. We investigate its predictive accuracy based on a large number of simulated data sets and then demonstrate its practical usefulness with an application to hurricane power outage risk assessment for a large utility based on actual data from the utility. While formulated for infrastructure disaster risk assessment, this method is promising for data-driven analysis for other situations with zero-inflated, complex data exhibiting response thresholds.

[1]  Seth D Guikema,et al.  Comparison and Validation of Statistical Methods for Predicting Power Outage Durations in the Event of Hurricanes , 2011, Risk analysis : an official publication of the Society for Risk Analysis.

[2]  Seth D Guikema,et al.  Improving the Predictive Accuracy of Hurricane Power Outage Forecasts Using Generalized Additive Models , 2009, Risk analysis : an official publication of the Society for Risk Analysis.

[3]  Haibin Liu,et al.  Statistical Forecasting of Electric Power Restoration Times in Hurricanes and Ice Storms , 2007, IEEE Transactions on Power Systems.

[4]  Matija Fajdiga,et al.  Reliability approximation using finite Weibull mixture distributions , 2004, Reliab. Eng. Syst. Saf..

[5]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[6]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[7]  Devika Subramanian,et al.  Performance assessment of topologically diverse power systems subjected to hurricane events , 2010, Reliab. Eng. Syst. Saf..

[8]  Seth D Guikema,et al.  A Flexible Count Data Regression Model for Risk Analysis , 2008, Risk analysis : an official publication of the Society for Risk Analysis.

[9]  Joseph H. Saleh,et al.  Single versus mixture Weibull distributions for nonparametric satellite reliability , 2010, Reliab. Eng. Syst. Saf..

[10]  Matija Fajdiga,et al.  An alternative perspective on the mixture estimation problem , 2006, Reliab. Eng. Syst. Saf..

[11]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[12]  Jery R. Stedinger,et al.  Negative Binomial Regression of Electric Power Outages in Hurricanes , 2005 .

[13]  Seth D Guikema,et al.  Prestorm Estimation of Hurricane Damage to Electric Power Distribution Systems , 2010, Risk analysis : an official publication of the Society for Risk Analysis.

[14]  Seth D. Guikema,et al.  Estimating the spatial distribution of power outages during hurricanes in the Gulf coast region , 2009, Reliab. Eng. Syst. Saf..