Assessment of Catastrophic Risk Using Bayesian Network Constructed from Domain Knowledge and Spatial Data

Prediction of natural disasters and their consequences is difficult due to the uncertainties and complexity of multiple related factors. This article explores the use of domain knowledge and spatial data to construct a Bayesian network (BN) that facilitates the integration of multiple factors and quantification of uncertainties within a consistent system for assessment of catastrophic risk. A BN is chosen due to its advantages such as merging multiple source data and domain knowledge in a consistent system, learning from the data set, inference with missing data, and support of decision making. A key advantage of our methodology is the combination of domain knowledge and learning from the data to construct a robust network. To improve the assessment, we employ spatial data analysis and data mining to extend the training data set, select risk factors, and fine-tune the network. Another major advantage of our methodology is the integration of an optimal discretizer, informative feature selector, learners, search strategies for local topologies, and Bayesian model averaging. These techniques all contribute to a robust prediction of risk probability of natural disasters. In the flood disaster's study, our methodology achieved a better probability of detection of high risk, a better precision, and a better ROC area compared with other methods, using both cross-validation and prediction of catastrophic risk based on historic data. Our results suggest that BN is a good alternative for risk assessment and as a decision tool in the management of catastrophic risk.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Michael Havbro Faber,et al.  On the Application of Bayesian Probabilistic Networks for Earthquake Risk Management , 2005 .

[4]  Adrienne Grêt-Regamey,et al.  Spatially explicit avalanche risk assessment linking Bayesian networks to a GIS , 2006 .

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Costas A. Varotsos,et al.  Natural Disasters as Interactive Components of Global-Ecodynamics , 2006 .

[7]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[8]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[9]  S. E. Ahmed,et al.  Bayesian Networks and Decision Graphs , 2008, Technometrics.

[10]  Shue Tuck Wong Natural Hazard Risk Assessment and Public Policy , 1983 .

[11]  Marco Zaffalon,et al.  Hazard Assessment of Debris Flows by Credal Networks. , 2004 .

[12]  Hong Jiang,et al.  Application of fuzzy measures in multi-criteria evaluation in GIS , 2000, Int. J. Geogr. Inf. Sci..

[13]  Ilan Noy,et al.  NATURAL DISASTERS , 2011 .

[14]  Shi Pei-jun,et al.  Theory on disaster science and disaster dynamics , 2002 .

[15]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[16]  G. Meehl,et al.  Climate extremes: observations, modeling, and impacts. , 2000, Science.

[17]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[18]  Shashi Shekhar,et al.  Discovery of patterns in earth science data using data mining , 2005 .

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  W. Petak,et al.  Natural Hazard Risk Assessment and Public Policy , 1986, IEEE Transactions on Reliability.

[21]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[22]  Louis Anthony Cox,et al.  Risk Analysis Foundations, Models, and Methods , 2001 .

[23]  I. Jolliffe Principal Component Analysis , 2002 .

[24]  Simon Kasif,et al.  Efficient Algorithms for Finding Multi-way Splits for Decision Trees , 1995, ICML.

[25]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[26]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[28]  Chengyi Wang,et al.  Typhoon insurance pricing with spatial decision support tools , 2005, International Journal of Geographical Information Science.

[29]  Hans C. van Houwelingen,et al.  The Elements of Statistical Learning, Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani and Jerome Friedman, Springer, New York, 2001. No. of pages: xvi+533. ISBN 0‐387‐95284‐5 , 2004 .

[30]  Tapio Elomaa,et al.  Finding Optimal Multi-Splits for Numerical Attributes in Decision Tree Learning , 1996 .

[31]  Jozef Zurada,et al.  Discovery of Patterns in Earth Science Data Using Data Mining , 2005 .

[32]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[33]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[34]  Gennady Agre,et al.  On Supervised and Unsupervised Discretization , 2007 .

[35]  Meng Ji,et al.  Characteristics and Tendencies of Annual Runoff Variations in the Heihe River Basin During the Past 60 years , 2008 .

[36]  Daniel Straub,et al.  Natural hazards risk assessment using Bayesian networks , 2005 .

[37]  Pierre Geurts,et al.  Contributions to decision tree induction: bias/variance tradeoff and time series classification , 2002 .