A comparison of methods for discretizing continuous variables in Bayesian Networks

Abstract Bayesian Networks (BNs) are an increasingly popular method for modelling environmental systems. The discretization of continuous variables is often required to use BNs. There are three main methods of discretization; manual, unsupervised, and supervised. Here, we compare and demonstrate each approach with a BN that predicts coastal erosion. Results reveal that supervised discretization methods produced BNs of the highest average predictive skill (73.8%), followed by manual discretization (69.0%) and unsupervised discretization (64.8%). However, each method has specific advantages that may make them more suitable for particular applications. Manual methods can produce physical meaningful BNs, which is favorable in environmental modelling. Supervised methods can autonomously and optimally discretize variables and may be preferred when predictive skill is a modelling priority. Unsupervised methods are computationally simple and versatile. The optimal discretization scheme should consider both the performance and practicality of the scheme.

[1]  Anthony J. Jakeman,et al.  An integrated approach to linking economic valuation and catchment modelling , 2011, Environ. Model. Softw..

[2]  Andrea Castelletti,et al.  Bayesian Networks and participatory modelling in water resource management , 2007, Environ. Model. Softw..

[3]  Nathaniel G. Plant,et al.  A cross-validation package driving Netica with python , 2015, Environ. Model. Softw..

[4]  Rafael Rumí,et al.  Bayesian networks in environmental modelling , 2011, Environ. Model. Softw..

[5]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[6]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[7]  Carl P. Schmertmann,et al.  Assessing Forecast Skill through Cross Validation , 1994 .

[8]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[9]  Kristen D. Splinter,et al.  Bayesian Networks in coastal engineering: Distinguishing descriptive and predictive applications , 2018 .

[10]  Craig A. Stow,et al.  Comparative analysis of discretization methods in Bayesian networks , 2017, Environ. Model. Softw..

[11]  B. Marcot,et al.  Using Bayesian belief networks to evaluate fish and wildlife population viability under land management alternatives from an environmental impact statement , 2001 .

[12]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[13]  Bruce G. Marcot,et al.  Metrics for evaluating performance and uncertainty of Bayesian network models , 2012 .

[14]  Kristen D. Splinter,et al.  Extreme coastal erosion enhanced by anomalous extratropical storm wave direction , 2017, Scientific Reports.

[15]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[16]  Serena H. Chen,et al.  Good practice in Bayesian network modelling , 2012, Environ. Model. Softw..

[17]  Nathaniel G. Plant,et al.  A Bayesian network to predict coastal vulnerability to sea level rise , 2011 .

[18]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[19]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[20]  Kevin B. Korb,et al.  Parameterisation and evaluation of a Bayesian network for use in an ecological risk assessment , 2007, Environ. Model. Softw..

[21]  Nathaniel G. Plant,et al.  Prediction and assimilation of surf-zone processes using a Bayesian network , 2011 .

[22]  Russell G. Death,et al.  How good are Bayesian belief networks for environmental management? A test with data from an agricultural river catchment , 2015 .