In pursuit of a more robust provenance in the field of species distribution modelling, an extensive
literature search was undertaken to find the typical default values, and the range of values, for configuration
settings of a number of the most commonly used statistical algorithms available for constructing species
distribution models (SDM), as implemented in the R script packages (such as Dismo and Biomod2) or other
species distribution modelling programs like Maxent. We found that documentation of SDM algorithm
configuration option settings in the SDM literature is very uncommon, and the justifications for these settings
were minimal, when present. Such settings were often the R default values, or were the result of trial and error.
This is potentially concerning for a number of reasons; it detracts from the robustness of the provenance for
such SDM studies; a lack of documentation of configuration option settings in a paper prevents the replication
of an experiment, which contravenes one of the main tenets of the scientific method. Inappropriate or
uninformed configuration option settings are particularly concerning if they represent a poorly understood
ecological variable or process, and if the algorithm is sensitive to such settings; this could result in erroneous
and/or unrealistic SDMs.
We test the sensitivity of two commonly used SDM algorithms to variation in configuration options settings:
Random Forests and Boosted Regression Trees. A process of expert elicitation was used to derive a range of
appropriate values with which to test the sensitivity of our algorithms. We chose to use species occurrence
records for the Koala (Phascolartos cinereus) for our sensitivity tests, since the species has a well known
distribution. Results were assessed by comparing the geospatial distribution from each sensitivity test (i.e.
altered-settings) SDM for differences compared to the control SDM (i.e. default settings), using geographical
information systems (QGIS). In addition, two performance measures were used to compare differences among
the altered-setting SDMs to the control. The aim of our study was to be able to draw conclusions as to how
reliable reported SDM results may be in light of the sensitivity of their algorithms to certain settings, given the
often arbitrary nature of such settings, and the lack of awareness of, and/or attendance to this issue in most of
the published SDM literature. Our results indicate that all two algorithms tested showed sensitivity to alternate
values for some of their settings. Therefore this study has showed that the choice of configuration option
settings in Random Forests and Boosted Regression Trees has an impact on the results, and that assigning
suitable values for these settings is a relevant consideration and as such should be always published along with
the model.
[1]
C. McAlpine,et al.
Drought-driven change in wildlife distribution and numbers: a case study of koalas in south west Queensland
,
2011
.
[2]
J. L. Parra,et al.
Very high resolution interpolated climate surfaces for global land areas
,
2005
.
[3]
R. Adams.
Bat reproduction declines when conditions mimic climate change projections for western North America.
,
2010,
Ecology.
[4]
Glenn De ' ath.
BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION
,
2007
.
[5]
M. Araújo,et al.
Uses and misuses of bioclimatic envelope modeling.
,
2012,
Ecology.
[6]
TIM M. BLACKBURN,et al.
Reproducibility and Repeatability in Ecology
,
2006
.
[7]
Robert P. Anderson,et al.
Maximum entropy modeling of species geographic distributions
,
2006
.
[8]
Limare Nicolas,et al.
Reproducible Research in Computational Science — Santiago 2013-04-15
,
2013
.
[9]
A Kumar,et al.
Biodiversity and Climate Change
,
2018
.
[10]
Matthew B. Jones,et al.
Challenges and Opportunities of Open Data in Ecology
,
2011,
Science.
[11]
R Core Team,et al.
R: A language and environment for statistical computing.
,
2014
.
[12]
A. Lisle,et al.
Low-density koala (Phascolarctos cinereus) populations in the mulgalands of south-west Queensland. III. Broad-scale patterns of habitat use
,
2003
.
[13]
C. McAlpine,et al.
Movement patterns of an arboreal marsupial at the edge of its range: a case study of the koala
,
2013,
Movement Ecology.
[14]
C. McAlpine,et al.
Physiological Stress in Koala Populations near the Arid Edge of Their Distribution
,
2013,
PloS one.
[15]
Gerhard Weis,et al.
The Biodiversity and Climate Change Virtual Laboratory: Where ecology meets big data
,
2016,
Environ. Model. Softw..
[16]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.
[17]
G. Gordon,et al.
A koala (Phascolarctos cinereus Goldfuss) population crash during drought and heatwave conditions in south-western Queensland
,
1988
.
[18]
K. Williams,et al.
Delineating environmental envelopes to improve mapping of species distributions, via a hurdle model with CART &/or MaxEnt
,
2015
.