Application of statistical techniques to proportional loss data: Evaluating the predictive accuracy of physical vulnerability to hazardous hydro-meteorological events.

Knowledge about the cause of differential structural damages following the occurrence of hazardous hydro-meteorological events can inform more effective risk management and spatial planning solutions. While studies have been previously conducted to describe relationships between physical vulnerability and features about building properties, the immediate environment and event intensity proxies, several key challenges remain. In particular, observations, especially those associated with high magnitude events, and studies designed to evaluate a comprehensive range of predictive features are both limited. To build upon previous developments, we described a workflow to support the continued development and assessment of empirical, multivariate physical vulnerability functions based on predictive accuracy. Within this workflow, we evaluated several statistical approaches, namely generalized linear models and their more complex alternatives. A series of models were built 1) to explicitly consider the effects of dimension reduction, 2) to evaluate the inclusion of interaction effects between and among predictors, 3) to evaluate an ensemble prediction method for applications where data observations are sparse, 4) to describe how model results can inform about the relative importance of predictors to explain variance in expected damages and 5) to assess the predictive accuracy of the models based on prescribed metrics. The utility of the workflow was demonstrated on data with characteristics of what is commonly acquired in ex-post field assessments. The workflow and recommendations from this study aim to provide guidance to researchers and practitioners in the natural hazards community.

[1]  Suzanne Lacasse,et al.  A conceptual framework for quantitative estimation of physical vulnerability to landslides , 2008 .

[2]  Christian Scheidl,et al.  The use of airborne LiDAR data for the analysis of debris flow events in Switzerland , 2008 .

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Charles E McCulloch,et al.  Relaxing the rule of ten events per variable in logistic and Cox regression. , 2007, American journal of epidemiology.

[5]  Gordon K. Smyth,et al.  Generalized linear models with varying dispersion , 1989 .

[6]  J. Concato,et al.  Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. , 1995, Journal of clinical epidemiology.

[7]  S. Fuchs,et al.  Matrices, curves and indicators: A review of approaches to assess physical vulnerability to debris flows , 2017 .

[8]  C. J. van Westen,et al.  The application of numerical debris flow modelling for the generation of physical vulnerability curves , 2011 .

[9]  H. Akaike A new look at the statistical model identification , 1974 .

[10]  M. Papathoma-Köhle Vulnerability curves vs. vulnerability indicators: application of an indicator-based methodology for debris-flow hazards , 2016 .

[11]  Eunsoo Choi,et al.  Seismic fragility of typical bridges in moderate seismic zone , 2003 .

[12]  A. Thieken,et al.  Estimating changes in flood risks and benefits of non-structural adaptation strategies - a case study from Tyrol, Austria , 2014, Mitigation and Adaptation Strategies for Global Change.

[13]  P Peduzzi,et al.  Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. , 1995, Journal of clinical epidemiology.

[14]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[15]  T. Glade,et al.  Improvement of vulnerability curves using data from extreme events: debris flow event in South Tyrol , 2012, Natural Hazards.

[16]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[17]  Stefan Margreth,et al.  Effectiveness of mitigation measures against natural hazards , 2010 .

[18]  F. Imamura,et al.  Empirical fragility analysis of building damage caused by the 2011 Great East Japan tsunami in Ishinomaki city using ordinal regression, and influence of key geographical features , 2014, Stochastic Environmental Research and Risk Assessment.

[19]  Zhi Huang,et al.  Application of random forest, generalised linear model and their hybrid methods with geostatistical techniques to count data: Predicting sponge species richness , 2017, Environ. Model. Softw..

[20]  A. Thieken,et al.  Damage assessment in Braunsbach 2016: data collection and analysis for an improved understanding of damaging processes during flash floods , 2017 .

[21]  Reginald DesRoches,et al.  Seismic fragility of typical bridges in moderate seismic zones , 2004 .

[22]  Matthias Templ,et al.  An application of VIM, the R package for visualization of missing values, to EU-SILC data , 2009 .

[23]  Torres Munguía,et al.  Comparison of Imputation Methods for Handling Missing Categorical Data with Univariate Pattern // Una comparación de métodos de imputación de variables categóricas con patrón univariado , 2014 .

[24]  C. J. van Westen,et al.  Multi-scale debris flow vulnerability assessment and direct loss estimation of buildings in the Eastern Italian Alps , 2016, Natural Hazards.

[25]  B Mazzorana,et al.  Developing consistent scenarios to assess flood hazards in mountain streams. , 2012, Journal of environmental management.

[26]  Explore Configuring,et al.  A Simulation Study to , 2004 .

[27]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[28]  Frank Scherbaum,et al.  Bayesian network learning for natural hazard analyses , 2014 .

[29]  J. Concato,et al.  A simulation study of the number of events per variable in logistic regression analysis. , 1996, Journal of clinical epidemiology.

[30]  Shuichi Kawano,et al.  Sparse principal component regression for generalized linear models , 2016, Comput. Stat. Data Anal..

[31]  C. Rheinberger,et al.  Proportional loss functions for debris flow events , 2013 .

[32]  Christina Magill,et al.  Building vulnerability to hydro-geomorphic hazards: Estimating damage probability from qualitative vulnerability assessment using logistic regression , 2016 .

[33]  Sven Fuchs,et al.  A quantitative vulnerability function for fluvial sediment transport , 2011 .

[34]  Daisuke Sugawara,et al.  A multivariate generalized linear tsunami fragility model for Kesennuma City based on maximum flow depths, velocities and debris impact, with evaluation of predictive accuracy , 2015, Natural Hazards.

[35]  C. Hegg,et al.  Ereignisanalyse Hochwasser 2005. Teil 2 - Analyse von Prozessen, Massnahmen und Gefahrengrundlagen , 2008 .

[36]  A. Montanari,et al.  Uncertainty in river discharge observations: a quantitative analysis , 2009 .

[37]  Ana M. Aguilera,et al.  Using principal components for estimating logistic regression with high-dimensional multicollinear data , 2006, Comput. Stat. Data Anal..

[38]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[39]  Julie Josse,et al.  Handling missing values in exploratory multivariate data analysis methods , 2012 .

[40]  M. Keiler,et al.  Application of Sensitivity Analysis for Process Model Calibration of Natural Hazards , 2018, Geosciences.

[41]  M. Sorriso-Valvo,et al.  The Use of Airborne LiDAR Data in Basin-Fan System Monitoring: An Example from Southern Calabria (Italy) , 2015 .

[42]  T. Glade,et al.  Physical vulnerability assessment for alpine hazards: state of the art and future needs , 2011 .

[43]  Tiziana Rossetto,et al.  Guidelines for the empirical vulnerability assessment , 2014 .

[44]  Laurent Gatto,et al.  Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. , 2016, Journal of proteome research.

[45]  Günter Blöschl,et al.  A compilation of data on European flash floods , 2009 .

[46]  Tiziana Rossetto,et al.  A proposed methodology for deriving tsunami fragility functions for buildings using optimum intensity measures , 2016, Natural Hazards.

[47]  D. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4 / √ 3 , 2013 .

[48]  Nina Zumel,et al.  vtreat: a data.frame Processor for Predictive Modeling , 2016, 1611.09477.

[49]  B. Merz,et al.  Flood-risk mapping: contributions towards an enhanced assessment of extreme events and associated risks , 2006 .

[50]  Bruno Merz,et al.  Multi-variate flood damage assessment: a tree-based data-mining approach , 2013 .

[51]  Tiziana Rossetto,et al.  Estimating Tsunami-Induced Building Damage through Fragility Functions: Critical Review and Research Needs , 2017, Front. Built Environ..

[52]  Claudia D. Volosciuk,et al.  Rising Mediterranean Sea Surface Temperatures Amplify Extreme Summer Precipitation in Central Europe , 2016, Scientific Reports.

[53]  M. Stoffel,et al.  Atmospheric Forcing of Debris Flows in the Southern Swiss Alps , 2013 .

[54]  Lin Song,et al.  Random generalized linear model: a highly accurate and interpretable ensemble predictor , 2013, BMC Bioinformatics.

[55]  Tapani Raiko,et al.  Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values Tkk Reports in Information and Computer Science Practical Approaches to Principal Component Analysis in the Presence of Missing Values , 2022 .

[56]  Dale Dominey-Howes,et al.  The use of empirical vulnerability functions to assess the response of buildings to tsunami impact: Comparative review and summary of best practice , 2015 .

[57]  Roger D. Peng,et al.  What is the question? , 2015, Science.

[58]  A. Gelman Scaling regression inputs by dividing by two standard deviations , 2008, Statistics in medicine.