Hybrid Bayesian network classifiers: Application to species distribution models

Bayesian networks are one of the most powerful tools in the design of expert systems located in an uncertainty framework. However, normally their application is determined by the discretization of the continuous variables. In this paper the naive Bayes (NB) and tree augmented naive Bayes (TAN) models are developed. They are based on Mixtures of Truncated Exponentials (MTE) designed to deal with discrete and continuous variables in the same network simultaneously without any restriction. The aim is to characterize the habitat of the spur-thighed tortoise (Testudo graeca graeca), using several continuous environmental variables, and one discrete (binary) variable representing the presence or absence of the tortoise. These models are compared with the full discrete models and the results show a better classification rate for the continuous one. Therefore, the application of continuous models instead of discrete ones avoids loss of statistical information due to the discretization. Moreover, the results of the TAN continuous model show a more spatially accurate distribution of the tortoise. The species is located in the Donana Natural Park, and in semiarid habitats. The proposed continuous models based on MTEs are valid for the study of species predictive distribution modelling.

[1]  S. Ferrier Mapping spatial pattern in biodiversity for regional conservation planning: where to from here? , 2002, Systematic biology.

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  José Manuel Gutiérrez,et al.  Expert Systems and Probabiistic Network Models , 1996 .

[4]  Søren Holbech Nielsen,et al.  Proceedings of the Second European Workshop on Probabilistic Graphical Models , 2004 .

[5]  A. Peterson,et al.  New developments in museum-based informatics and applications in biodiversity analysis. , 2004, Trends in ecology & evolution.

[6]  Bronwyn Price,et al.  Using a Bayesian belief network to predict suitable habitat of an endangered mammal – The Julia Creek dunnart (Sminthopsis douglasi) , 2007 .

[7]  M. Araújo,et al.  Presence-absence versus presence-only modelling methods for predicting bird habitat suitability , 2004 .

[8]  M. Araújo,et al.  Climate warming and the decline of amphibians and reptiles in Europe , 2006 .

[9]  David R. B. Stockwell,et al.  Future projections for Mexican faunas under global climate change scenarios , 2002, Nature.

[10]  M. Austin Spatial prediction of species distribution: an interface between ecological theory and statistical modelling , 2002 .

[11]  Robert P. Anderson,et al.  Using niche-based GIS modeling to test geographic predictions of competitive exclusion and competitive release in South American pocket mice , 2002 .

[12]  A. Lehmann,et al.  Improving generalized regression analysis for the spatial prediction of forest communities , 2006 .

[13]  Rafael Rumí,et al.  Aalborg Universitet Inference in hybrid Bayesian networks , 2016 .

[14]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[15]  Jane Elith,et al.  Fauna habitat modelling and mapping: A review and case study in the Lower Hunter Central Coast region of NSW , 2005 .

[16]  Serafín Moral,et al.  Approximating Conditional MTE Distributions by Means of Mixed Trees , 2003, ECSQARU.

[17]  Marco Zaffalon,et al.  Credible classification for environmental problems , 2005, Environ. Model. Softw..

[18]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[19]  Rafael Rumí,et al.  Learning hybrid Bayesian networks using mixtures of truncated exponentials , 2006, Int. J. Approx. Reason..

[20]  Antonio Salmerón,et al.  Extension of Bayesian Network Classifiers to Regression Problems , 2008, IBERAMIA.

[21]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[22]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[23]  A. Lehmann,et al.  Regression models for spatial prediction: their role for biodiversity and conservation , 2002, Biodiversity & Conservation.

[24]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[25]  David B. Lindenmayer,et al.  MANAGING LANDSCAPES FOR CONSERVATION UNDER UNCERTAINTY , 2005 .

[26]  Rafael Rumí,et al.  Approximate probability propagation with mixtures of truncated exponentials , 2007, Int. J. Approx. Reason..

[27]  W. Thuiller Patterns and uncertainties of species' range shifts under climate change , 2004 .

[28]  Rafael Márquez,et al.  Atlas y Libro Rojo de los Anfibios y Reptiles de España , 2003 .

[29]  Mark E. Borsuk,et al.  A Bayesian network of eutrophication models for synthesis, prediction, and uncertainty analysis , 2004 .

[30]  Prakash P. Shenoy,et al.  Approximating probability density functions in hybrid Bayesian networks with mixtures of truncated exponentials , 2006, Stat. Comput..

[31]  Serafín Moral,et al.  Estimating Mixtures of Truncated Exponentials from Data , 2002, Probabilistic Graphical Models.

[32]  Jiri Vomlel,et al.  Proceedings of the Fifth European Workshop on Probabilistic Graphical Models (PGM-2010), , 2010 .

[33]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[34]  Anthony Lehmann,et al.  GRASP: generalized regression analysis and spatial prediction , 2002 .

[35]  Dunja Mladenic,et al.  Feature Selection for Dimensionality Reduction , 2005, SLSFS.

[36]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[37]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[38]  S. Weiss,et al.  GLM versus CCA spatial modeling of plant species distribution , 1999, Plant Ecology.

[39]  Serafín Moral,et al.  Mixtures of Truncated Exponentials in Hybrid Bayesian Networks , 2001, ECSQARU.

[40]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[41]  Prakash P. Shenoy,et al.  Approximating Probability Density Functions with Mixtures of Truncated Exponentials , 2004 .

[42]  Serafín Moral,et al.  Estimating mixtures of truncated exponentials in hybrid bayesian networks , 2006 .

[43]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[44]  A. Lehmann,et al.  Using Niche‐Based Models to Improve the Sampling of Rare Species , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[45]  Mark E. Borsuk,et al.  Assessing the decline of brown trout (Salmo trutta) in Swiss rivers using a Bayesian probability network , 2006 .

[46]  T. Dawson,et al.  Model‐based uncertainty in species range prediction , 2006 .

[47]  M. Araújo,et al.  An evaluation of methods for modelling species distributions , 2004 .

[48]  Antonio Salmerón,et al.  Tree Augmented Naive Bayes for Regression Using Mixtures of Truncated Exponentials: Application to Higher Education Management , 2007, IDA.

[49]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[50]  Peter Goethals,et al.  Optimization of Artificial Neural Network (ANN) model design for prediction of macroinvertebrates in the Zwalm river basin (Flanders, Belgium) , 2004 .

[51]  L. Hannah,et al.  Developing regional and species-level assessments of climate change impacts on biodiversity in the Cape Floristic Region , 2003 .

[52]  Sašo Džeroski,et al.  Using regression trees to identify the habitat preference of the sea cucumber (Holothuria leucospilota) on Rarotonga, Cook Islands , 2003 .

[53]  J. Bromley,et al.  The use of Hugin® to develop Bayesian networks as an aid to integrated water resource planning , 2005, Environ. Model. Softw..

[54]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[55]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[56]  David G. Stork,et al.  Pattern Classification , 1973 .

[57]  C. Graham,et al.  INTEGRATING PHYLOGENETICS AND ENVIRONMENTAL NICHE MODELS TO EXPLORE SPECIATION MECHANISMS IN DENDROBATID FROGS , 2004, Evolution; international journal of organic evolution.

[58]  Jennifer A. Miller,et al.  Modeling the distribution of four vegetation alliances using generalized linear models and classification trees with spatial dependence , 2002 .

[59]  Antonio Salmerón,et al.  Learning Bayesian Networks for Regression from Incomplete Databases , 2010, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[60]  Enrique F. Castillo,et al.  Expert Systems and Probabilistic Network Models , 1996, Monographs in Computer Science.

[61]  W. Thuiller,et al.  Predicting species distribution: offering more than simple habitat models. , 2005, Ecology letters.

[62]  L. Hannah,et al.  Would climate change drive species out of reserves? An assessment of existing reserve‐selection methods , 2004 .

[63]  Carmel Pollino,et al.  Examination of conflicts and improved strategies for the management of an endangered Eucalypt species using Bayesian networks , 2007 .

[64]  M. Luoto,et al.  Uncertainty of bioclimate envelope models based on the geographical distribution of species , 2005 .

[65]  Gretchen G. Moisen,et al.  Comparing five modelling techniques for predicting forest characteristics , 2002 .

[66]  A. Peterson Predicting the Geography of Species’ Invasions via Ecological Niche Modeling , 2003, The Quarterly Review of Biology.