A data mining approach to predictive vegetation mapping using probabilistic graphical models

Abstract This paper develops a novel method to model and predict the spatial distribution of vegetation types in Swaziland using physiographic and bioclimatic variables. The method uses a data mining approach implemented within a probabilistic graphical model to match two observed hierarchical levels of vegetation. The classification uses Bayesian networks (BN) and the parameterization is based on the expectation-maximization (EM) algorithm. The model is tested on a random sample of mapped vegetation types in Swaziland and allowed for the identification of the key environmental variables that are most important for capturing the vegetation spatial distribution. We show that while elevation and geology are the most important variables explaining the spatial distribution patterns of vegetation for both models, the influence of the climatic and other variables on the vegetation at the two levels differ. The overall distribution of the predicted vegetation classes was very similar to their distribution on the observed vegetation map. Overall the error rate was found to be 9.35% for a model of 16 vegetation classes and 4.9% for the one with 5 classes, indicating the excellent classification accuracy of the approach despite the complex landscape of the study area. Possible sources of error and some limitations are discussed and conclusions are drawn including suggestions for further investigation.

[1]  Max Henrion,et al.  Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis , 1990 .

[2]  C. Daly,et al.  A Discriminant Analysis Model of Alaskan Biomes Based on Spatial Climatic and Environmental Data , 2009 .

[3]  Jiejun Huang,et al.  Construction and Application of Bayesian Network Model for Spatial Data Mining , 2007, 2007 IEEE International Conference on Control and Automation.

[4]  Hong S. He,et al.  Mapping pre-European settlement vegetation at fine resolutions using a hierarchical Bayesian model and GIS , 2007, Plant Ecology.

[5]  Jayanta K. Ghosh Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis by Uffe B. Kjaerulff, Anders L. Madsen , 2008 .

[6]  R. Scholes,et al.  The distribution of sweetveld and sourveld in South Africa's grassland biome in relation to environmental factors , 1995 .

[7]  Anthony J. Jakeman,et al.  Artificial Intelligence techniques: An introduction to their use for modelling environmental systems , 2008, Math. Comput. Simul..

[8]  M. C. Rutherford,et al.  The vegetation of South Africa, Lesotho and Swaziland. , 2006 .

[9]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[10]  Russell G. Congalton,et al.  A review of assessing the accuracy of classifications of remotely sensed data , 1991 .

[11]  M. Lawes,et al.  The influence of climate change on the distribution of indigenous forest in KwaZulu‐Natal, South Africa , 1999 .

[12]  R. Whittaker,et al.  Species Diversity--Scale Matters , 2002, Science.

[13]  Uffe Kjærulff,et al.  Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis , 2007, Information Science and Statistics.

[14]  Nianjun Liu,et al.  Discover Knowledge From Distribution Maps Using Bayesian Networks , 2006, AusDM.

[15]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[16]  The influence of variables in a logistic model , 1991 .

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Del Meidinger,et al.  Capturing expert knowledge for ecosystem mapping using Bayesian networks , 2006 .

[19]  Olivier Pourret Introduction to Bayesian Networks , 2008 .

[20]  Adriaan van Niekerk,et al.  Vegetation Atlas of South Africa, Lesotho and Swaziland , 2007 .

[21]  Janet Franklin,et al.  Terrain variables used for predictive mapping of vegetation communities in southern California , 2000 .

[22]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence , 2004, Computer science and data analysis series.

[23]  M. J. Kraak,et al.  Cartography: Visualization of Geospatial Data , 1996 .

[24]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[25]  Richard M. Cowling,et al.  Vegetation of southern Africa , 1997 .

[26]  B. Marcot,et al.  Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation , 2006 .

[27]  P. Torfs,et al.  Bayesian classification of vegetation types with Gaussian mixture density fitting to indicator values , 2007 .

[28]  L. Mucina,et al.  Afrotemperate, Subtropical and Azonal Forests , 2006 .

[29]  J. Franklin Predictive vegetation mapping: geographic modelling of biospatial patterns in relation to environmental gradients , 1995 .

[30]  Jennifer A. Miller,et al.  Incorporating spatial dependence in predictive vegetation models , 2007 .

[31]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[32]  Janet Franklin,et al.  Explicitly incorporating spatial dependence in predictive vegetation models in the form of explanatory variables: a Mojave Desert case study , 2006, J. Geogr. Syst..

[33]  M. C. Rutherford,et al.  Vegetation Map of South Africa, Lesotho and Swaziland. 1:1 000 000 scale sheet maps , 2005 .

[34]  Mevin B. Hooten,et al.  Predicting the spatial distribution of ground flora on large domains using a hierarchical Bayesian model , 2003, Landscape Ecology.

[35]  Holger R. Maier,et al.  Future research challenges for incorporation of uncertainty in environmental and ecological decision-making , 2008 .

[36]  D. Fairbanks,et al.  Physio-climatic classification of South Africa's woodland biome , 2000, Plant Ecology.

[37]  David J. Spiegelhalter,et al.  Sequential Model Criticism in Probabilistic Expert Systems , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[39]  John P. Wilson,et al.  Terrain analysis : principles and applications , 2000 .

[40]  Ankush Mittal,et al.  Bayesian Network Technologies: Applications and Graphical Models , 2007 .

[41]  David M. Cairns,et al.  A comparison of methods for predicting vegetation type , 2001, Plant Ecology.

[42]  Janet Franklin,et al.  Enhancing a regional vegetation map with predictive models of dominant plant species in chaparral , 2002 .

[43]  G. Murdoch Soils and land capability in Swaziland , 1969 .

[44]  Shawn W. Laffan,et al.  Effect of error in the DEM on environmental variables for predictive vegetation modelling , 2004 .

[45]  C. Geldenhuys Richness, composition and relationships of the floras of selected forests in southern Africa , 1992 .

[46]  T. Dawson,et al.  Predicting the impacts of climate change on the distribution of species: are bioclimate envelope models useful? , 2003 .

[47]  F Taroni,et al.  A general approach to Bayesian networks for the interpretation of evidence. , 2004, Forensic science international.

[48]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[49]  Stephen J. Walsh,et al.  GIS and remote sensing applications in biogeography and ecology , 2001 .

[50]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[51]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[52]  F. Kienast,et al.  A simulated map of the potential natural forest vegetation of Switzerland , 1993 .

[53]  M. Willig,et al.  Impacts of extreme weather and climate on terrestrial biota , 2000 .

[54]  L. Tichý Predictive modeling of the potential natural vegetation pattern in the Podyjí National Park, Czech Republic , 1999, Folia Geobotanica.

[55]  R. S. Adamson,et al.  The vegetation of South Africa. , 1938 .

[56]  Ockie J. H. Bosch,et al.  Adaptive management: making it happen through participatory systems analysis , 2008 .

[57]  M. C. Rutherford,et al.  Vegetation Map of South Africa, Lesotho and Swaziland: Shapefiles of basic mapping units , 2004 .

[58]  Kathryn B. Laskey Sensitivity analysis for probability assessments in Bayesian networks , 1995, IEEE Trans. Syst. Man Cybern..

[59]  James S. Clark,et al.  A future for models and data in environmental science. , 2006, Trends in ecology & evolution.

[60]  Carlos A. Coelho,et al.  Projected Changes in Mean and Extreme Precipitation in Africa under Global Warming. Part I: Southern Africa , 2009 .

[61]  Olivier Pourret,et al.  Bayesian networks : a practical guide to applications , 2008 .

[62]  Giles M. Foody,et al.  Uncertainty, knowledge discovery and data mining in GIS , 2003 .

[63]  B. Marcot,et al.  Bayesian belief networks: applications in ecology and natural resource management , 2006 .

[64]  Tongli Wang,et al.  Potential effects of climate change on ecosystem and tree species distribution in British Columbia. , 2006, Ecology.

[65]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .