Global mapping of potential natural vegetation: an assessment of machine learning algorithms for estimating land potential

Potential natural vegetation (PNV) is the vegetation cover in equilibrium with climate, that would exist at a given location if not impacted by human activities. PNV is useful for raising public awareness about land degradation and for estimating land potential. This paper presents results of assessing machine learning algorithms—neural networks (nnet package), random forest (ranger), gradient boosting (gbm), K-nearest neighborhood (class) and Cubist—for operational mapping of PNV. Three case studies were considered: (1) global distribution of biomes based on the BIOME 6000 data set (8,057 modern pollen-based site reconstructions), (2) distribution of forest tree taxa in Europe based on detailed occurrence records (1,546,435 ground observations), and (3) global monthly fraction of absorbed photosynthetically active radiation (FAPAR) values (30,301 randomly-sampled points). A stack of 160 global maps representing biophysical conditions over land, including atmospheric, climatic, relief, and lithologic variables, were used as explanatory variables. The overall results indicate that random forest gives the overall best performance. The highest accuracy for predicting BIOME 6000 classes (20) was estimated to be between 33% (with spatial cross-validation) and 68% (simple random sub-setting), with the most important predictors being total annual precipitation, monthly temperatures, and bioclimatic layers. Predicting forest tree species (73) resulted in mapping accuracy of 25%, with the most important predictors being monthly cloud fraction, mean annual and monthly temperatures, and elevation. Regression models for FAPAR (monthly images) gave an R-square of 90% with the most important predictors being total annual precipitation, monthly cloud fraction, CHELSA bioclimatic layers, and month of the year, respectively. Further developments of PNV mapping could include using all GBIF records to map the global distribution of plant species at different taxonomic levels. This methodology could also be extended to dynamic modeling of PNV, so that future climate scenarios can be incorporated. Global maps of biomes, FAPAR and tree species at one km spatial resolution are available for download via http://dx.doi.org/10.7910/DVN/QQHCIK.

[1]  Machine Learning in Geosciences , 2020, Advances in Geophysics.

[2]  Marvin N. Wright,et al.  Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables , 2018, PeerJ.

[3]  V. D. Laet,et al.  Pollen‐derived biomes in the Eastern Mediterranean–Black Sea–Caspian‐Corridor , 2018 .

[4]  Yoan Fourcade,et al.  Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics , 2018 .

[5]  H. Haberl,et al.  Unexpectedly large impact of forest management and grazing on global vegetation biomass , 2017, Nature.

[6]  Marios Michailidis Investigating machine learning methods in recommender systems , 2017 .

[7]  Pete Smith,et al.  Natural climate solutions , 2017, Proceedings of the National Academy of Sciences.

[8]  M. Kuhnert,et al.  Global Hotspots of Conflict Risk between Food Security and Biodiversity Conservation , 2017 .

[9]  Achille Mauri,et al.  EU-Forest, a high-resolution tree occurrence dataset for Europe , 2017, Scientific Data.

[10]  Olaf Conrad,et al.  Climatologies at high resolution for the earth’s land surface areas , 2016, Scientific Data.

[11]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[12]  COMPSCI 371D Random Forests , 2017, Encyclopedia of Machine Learning and Data Mining.

[13]  Arturo H. Ariño,et al.  Biodiversity data obsolescence and land uses changes , 2016, PeerJ.

[14]  J. Pekel,et al.  High-resolution mapping of global surface water and its long-term changes , 2016, Nature.

[15]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[16]  Victoria J. Burton,et al.  Has land use pushed terrestrial biodiversity beyond the planetary boundary? A global assessment , 2016, Science.

[17]  Lizhe Wang,et al.  A Comparison of Machine Learning Algorithms for Mapping of Complex Surface-Mined and Agricultural Landscapes Using ZiYuan-3 Stereo Satellite Imagery , 2016, Remote. Sens..

[18]  R. Rohli,et al.  Global distribution of Köppen–Geiger climate types during the Last Glacial Maximum, Mid-Holocene, and present , 2016 .

[19]  Pierre Friedlingstein,et al.  The terrestrial biosphere as a net source of greenhouse gases to the atmosphere , 2016, Nature.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  W. Jetz,et al.  Remotely Sensed High-Resolution Global Cloud Dynamics for Predicting Ecosystem and Biodiversity Distributions , 2016, PLoS biology.

[22]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[23]  Amir Hossein Alavi,et al.  Machine learning in geosciences and remote sensing , 2016 .

[24]  Bernd Bischl,et al.  mlr: Machine Learning in R , 2016, J. Mach. Learn. Res..

[25]  J. Elith,et al.  Species distribution modeling with R , 2016 .

[26]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[27]  W. Mauser,et al.  Global biomass production potentials exceed expected future demand without the need for cropland expansion , 2015, Nature Communications.

[28]  G. Roderick,et al.  Remote sensing captures varying temporal patterns of vegetation between human-altered and natural landscapes , 2015, PeerJ.

[29]  Michael Bock,et al.  System for Automated Geoscientific Analyses (SAGA) v. 2.1.4 , 2015 .

[30]  Lisa-Maria Rebelo,et al.  Development of a global inundation map at high spatial resolution from topographic downscaling of coarse-scale remote sensing data , 2015 .

[31]  R. Deo,et al.  Application of the extreme learning machine algorithm for the prediction of monthly Effective Drought Index in eastern Australia , 2015 .

[32]  Durrant Tracy,et al.  European Atlas Forest Tree Species , 2015 .

[33]  C. Justice,et al.  High-Resolution Global Maps of 21st-Century Forest Cover Change , 2013, Science.

[34]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[35]  Y. Fan,et al.  Global Patterns of Groundwater Table Depth , 2013, Science.

[36]  J. Tailleur,et al.  Global Patterns of Groundwater Table Depth , 2013 .

[37]  K. Shepherd,et al.  The global Land-Potential Knowledge System (LandPKS): Supporting evidence-based, site-specific land use and management through cloud computing, mobile applications, and crowdsourcing , 2013, Journal of Soil and Water Conservation.

[38]  Jens Hartmann,et al.  The new global lithological map database GLiM: A representation of rock properties at the Earth surface , 2012 .

[39]  Mathieu Vrac,et al.  Statistical modelling of a new global potential vegetation distribution , 2012 .

[40]  H. Loeng Climate change and the Arctic , 2012 .

[41]  Alexander Brenning,et al.  Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[42]  A. Bryn,et al.  Three methods for modelling potential natural vegetation (PNV) compared: A methodological case study from south-central Norway , 2012 .

[43]  S. Harrison,et al.  Records from the Past, Lessons for the Future: What the Palaeorecord Implies about Mechanisms of Global Change , 2012 .

[44]  A. Baccini,et al.  Mapping forest canopy height globally with spaceborne lidar , 2011 .

[45]  Monica Borda,et al.  Fundamentals in Information Theory and Coding , 2011 .

[46]  Diane Pike,et al.  The World Without Us? , 2011 .

[47]  G. Nabuurs,et al.  Statistical mapping of tree species over Europe , 2011, European Journal of Forest Research.

[48]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[49]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[50]  P. Potapov,et al.  Mapping the World's Intact Forest Landscapes by Remote Sensing , 2008 .

[51]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[52]  Tim Sutton,et al.  How Global Is the Global Biodiversity Information Facility? , 2007, PloS one.

[53]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[54]  N. Zazanashvili,et al.  The Map of the Natural Vegetation of Europe and its application in the Caucasus Ecoregion , 2007 .

[55]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[56]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[57]  Sandy P. Harrison,et al.  Pollen‐based reconstructions of biome distributions for Australia, Southeast Asia and the Pacific (SEAPAC region) at 0, 6000 and 18,000 14C yr BP , 2004 .

[58]  Sandy P. Harrison,et al.  Climate change and Arctic ecosystems: 1. Vegetation changes north of 55°N between the last glacial maximum, mid‐Holocene, and present , 2003 .

[59]  J. Friedman Stochastic gradient boosting , 2002 .

[60]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[61]  I. Prentice,et al.  Diversity of temperate plants in east Asia , 2001 .

[62]  S. P. Harrison,et al.  Palaeovegetation (Communications arising): Diversity of temperate plants in east Asia , 2001, Nature.

[63]  D. Jolly,et al.  Mid‐Holocene and glacial‐maximum vegetation geography of the northern continents and Africa , 2000 .

[64]  B. C. Hansen,et al.  Pollen-based biome reconstructions for Latin America: applications at a range of spatial and temporal scales and links to climate and vegetation model output. , 2000 .

[65]  B. C. Hansen,et al.  Pollen-based biome reconstructions for Latin America at 0, 6000 and 18 000 radiocarbon years ago , 2009 .

[66]  J. Omernik Ecoregions of the Conterminous United States , 1987 .

[67]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[68]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[69]  David John Lary,et al.  Machine learning in geosciences and remote sensing Geoscience Frontiers , 2022 .