A variable selection package driving Netica with Python

Abstract Bayesian Networks (BNs) are useful methods of probabilistically modelling environmental systems. BN performance is sensitive to the number of variables included in the model framework. The selection of the optimum set of variables to include in a BN (“variable selection”) is therefore a key part of the BN modelling process. While variable selection is an issue dealt with in the wider BN and machine learning literature, it remains largely absent from environmental BN applications to date, due in large part to a lack of software designed to work with available BN packages. CVNetica_VS is an open-source Python module that extends the functionality of Netica, a commonly used commercial BN software package, to perform variable selection. CVNetica_VS uses wrapper-based variable selection and cross-validation to search for the optimum variable set to use in a BN. The software will aid in objectifying and automating the development of BNs in environmental applications.

[1]  Kristen D. Splinter,et al.  Bayesian Networks in coastal engineering: Distinguishing descriptive and predictive applications , 2018 .

[2]  Olli Varis,et al.  Bayesian decision analysis for environmental and resource management , 1997 .

[3]  Kevin B. Korb,et al.  Parameterisation and evaluation of a Bayesian network for use in an ecological risk assessment , 2007, Environ. Model. Softw..

[4]  B. Marcot,et al.  Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation , 2006 .

[5]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[6]  Serena H. Chen,et al.  Good practice in Bayesian network modelling , 2012, Environ. Model. Softw..

[7]  Michael N. Fienen,et al.  A statistical learning framework for groundwater nitrate models of the Central Valley, California, USA , 2015 .

[8]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[9]  N. Plant,et al.  Bridging groundwater models and decision support with a Bayesian network , 2013 .

[10]  Ap van Dongeren,et al.  Predicting coastal hazards for sandy coasts with a Bayesian Network , 2016 .

[11]  Nathaniel G. Plant,et al.  A Bayesian network to predict coastal vulnerability to sea level rise , 2011 .

[12]  Rafael Rumí,et al.  Bayesian networks in environmental modelling , 2011, Environ. Model. Softw..

[13]  Eugene Charniak,et al.  Bayesian Networks without Tears , 1991, AI Mag..

[14]  Michael N. Fienen,et al.  Evaluating the sources of water to wells: Three techniques for metamodeling of a groundwater flow model , 2016, Environ. Model. Softw..

[15]  Bruce G. Marcot,et al.  Metrics for evaluating performance and uncertainty of Bayesian network models , 2012 .

[16]  Nathaniel G. Plant,et al.  Predicting coastal cliff erosion using a Bayesian probabilistic model , 2010 .

[17]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  Tomas Beuzen,et al.  A comparison of methods for discretizing continuous variables in Bayesian Networks , 2018, Environ. Model. Softw..

[21]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[22]  Brian E. Granger,et al.  IPython: A System for Interactive Scientific Computing , 2007, Computing in Science & Engineering.

[23]  Craig A. Stow,et al.  Comparative analysis of discretization methods in Bayesian networks , 2017, Environ. Model. Softw..

[24]  Andrea Castelletti,et al.  Bayesian Networks and participatory modelling in water resource management , 2007, Environ. Model. Softw..

[25]  Nathaniel G. Plant,et al.  A cross-validation package driving Netica with python , 2015, Environ. Model. Softw..

[26]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[27]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Sandra Johnson,et al.  Bayesian networks in environmental and resource management , 2012, Integrated environmental assessment and management.

[30]  Andrea Castelletti,et al.  An evaluation framework for input variable selection algorithms for environmental data-driven models , 2014, Environ. Model. Softw..

[31]  Kristen D. Splinter,et al.  Extreme coastal erosion enhanced by anomalous extratropical storm wave direction , 2017, Scientific Reports.