Modeling Malicious Hacking Data Breach Risks

Malicious hacking data breaches cause millions of dollars in financial losses each year, and more companies are seeking cyber insurance coverage. The lack of suitable statistical approaches for scoring breach risks is an obstacle in the insurance industry. We propose a novel frequency–severity model to analyze hacking breach risks at the individual company level, which would be valuable for underwriting purposes. We find that breach frequency can be modeled by a hurdle Poisson model, which is different from the negative binomial model used in the literature. The breach severity shows a heavy tail that can be captured by a nonparametric- generalized Pareto distribution model. We further discover a positive nonlinear dependence between frequency and severity, which our model also accommodates. Both the in-sample and out-of-sample studies show that the proposed frequency–severity model that accommodates nonlinear dependence has satisfactory performance and is superior to the other models, including the independence frequency–severity and Tweedie models.

[1]  D. Sornette,et al.  Heavy-tailed distribution of cyber-risks , 2008, 0803.2256.

[2]  U. Schepsmeier,et al.  CDVine: Modeling Dependence with C- and D-Vine Copulas in R , 2013 .

[3]  Yanwei Zhang,et al.  Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models , 2013, Stat. Comput..

[4]  C. Czado,et al.  A mixed copula model for insurance claims and claim sizes , 2012 .

[5]  Peter F. Christoffersen Evaluating Interval Forecasts , 1998 .

[6]  C. Czado,et al.  Total loss estimation using copula-based regression models , 2012, 1209.5356.

[7]  Sasha Romanosky,et al.  Examining the costs and causes of cyber incidents , 2016, J. Cybersecur..

[8]  Eric P. Smith,et al.  An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[9]  J. Corcoran Modelling Extremal Events for Insurance and Finance , 2002 .

[10]  Peng Shi,et al.  Dependent frequency–severity modeling of insurance claims , 2015 .

[11]  Bent Jørgensen,et al.  Fitting Tweedie's compound poisson model to insurance claims data , 1994 .

[12]  Les Oxley,et al.  Extreme value modelling for forecasting market crisis impacts , 2010 .

[13]  Edward W. Frees,et al.  Regression Modeling with Actuarial and Financial Applications , 2009 .

[14]  Mingyao Li,et al.  Joint Regression Analysis of Correlated Data Using Gaussian Copulas , 2009, Biometrics.

[15]  Peter F. CHRISTOFFERSENti EVALUATING INTERVAL FORECASTS , 2016 .

[16]  Didier Sornette,et al.  The extreme risk of personal data breaches and the erosion of privacy , 2015, The European Physical Journal B.

[17]  G. Russell,et al.  A flexible extreme value mixture model , 2011, Comput. Stat. Data Anal..

[18]  Gee Y. Lee,et al.  Multivariate Frequency-Severity Regression Models in Insurance , 2016, Risks.

[19]  Shouhuai Xu,et al.  Modeling and Predicting Cyber Hacking Breaches , 2018, IEEE Transactions on Information Forensics and Security.

[20]  Eike Christian Brechmann,et al.  Modeling Dependence with C- and D-Vine Copulas: The R Package CDVine , 2013 .

[21]  Martin Eling,et al.  What do we know about cyber risk and cyber risk insurance , 2016 .

[22]  Emiliano A. Valdez,et al.  Testing Adverse Selection with Two‐Dimensional Information: Evidence from the Singapore Auto Insurance Market , 2012 .

[23]  P. Embrechts,et al.  Dependence modeling with copulas , 2007 .

[24]  Benjamin Edwards,et al.  Hype and Heavy Tails: A Closer Look at Data Breaches , 2016, WEIS.

[25]  Nicola Loperfido,et al.  Data breaches: Goodness of fit, pricing, and risk measurement , 2017 .

[26]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[27]  Martin Eling,et al.  What are the actual costs of cyber risk events? , 2019, Eur. J. Oper. Res..

[28]  Fang Liu,et al.  Enterprise data breach: causes, challenges, prevention, and future directions , 2017, WIREs Data Mining Knowl. Discov..

[29]  Glenn Meyers,et al.  Insurance Ratemaking and a Gini Index , 2014 .

[30]  Peng Shi,et al.  Private information in healthcare utilization: specification of a copula‐based hurdle model , 2015 .

[31]  Fabio Martinelli,et al.  Cyber-insurance survey , 2017, Comput. Sci. Rev..

[32]  Maochao Xu,et al.  Cybersecurity Insurance: Modeling and Pricing , 2019, North American Actuarial Journal.

[33]  A. Zeileis,et al.  Regression Models for Count Data in R , 2008 .

[34]  Martin Eling,et al.  Copula approaches for modeling cross-sectional dependence of data breach losses , 2018, Insurance: Mathematics and Economics.