An important component in the fields of ecology and conservation biology is understanding the environmental conditions and geographic areas that are suitable for a given species to inhabit. A common tool in determining such areas is species distribution modeling which uses computer algorithms to determine the spatial distribution of organisms. Most commonly the correlative relationships between the organism and environmental variables are the primary consideration. The data requirements for this type of modeling consist of known presence and possibly absence locations of the species as well as the values of environmental or climatic covariates thought to define the species habitat suitability at these locations. These covariate data are generally extracted from remotely sensed imagery, interpolated/gridded historical climate data, or downscaled climate model output. Traditionally, ecologists and biologists have constructed species distribution models using workflows and data that reside primarily on their local workstations or networks. This workflow is becoming challenging as scientists increasingly try to use these modeling techniques to inform management decisions under different climate change scenarios. This challenge stems from the fact that remote sensing products, gridded historical climate, and downscaled climate models are not only increasing in spatial and temporal resolution but proliferating as well. Any rigorous assessment of uncertainty requires a computationally intensive sensitivity analysis accounting for various sources of uncertainty. The scientists fitting these models generally do not have the background in computer science required to take advantage of recent advances in web-service based data acquisition, remote high-powered data processing, or scientific workflow systems. Ecologists in the field of modeling are in need of a tractable platform that abstracts the inherent computational complexity required to incorporate the burgeoning field of coupled climate and ecological response modeling. In this paper we describe the computational challenges in species distribution modeling and solutions using scientific workflow systems. We focus on the Software for Assisted Species Modeling (SAHM) a package within VisTrails, an open-source scientific workflow system.
[1]
Juliana Freire,et al.
Tackling the Provenance Challenge one layer at a time
,
2008,
Concurr. Comput. Pract. Exp..
[2]
Eric R. Ziegel,et al.
Generalized Linear Models
,
2002,
Technometrics.
[3]
R. Tibshirani,et al.
Generalized Additive Models
,
1986
.
[4]
S. Lek,et al.
Uncertainty in ensemble forecasting of species distribution
,
2010
.
[5]
Kees M. van Hee,et al.
Workflow Management: Models, Methods, and Systems
,
2002,
Cooperative information systems.
[6]
A. Brown,et al.
The Architecture of Open Source Applications
,
2011
.
[7]
T. Rangel,et al.
Partitioning and mapping uncertainties in ensembles of forecasts of species turnover under climate change
,
2009
.
[8]
Jordan Walker,et al.
Description of the U.S. Geological Survey Geo Data Portal Data Integration Framework
,
2012,
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[9]
J. Friedman.
Special Invited Paper-Additive logistic regression: A statistical view of boosting
,
2000
.
[10]
Yoshua Bengio,et al.
Pattern Recognition and Neural Networks
,
1995
.
[11]
Yaxing Wei,et al.
UV-CDAT: Analyzing Climate Datasets from a User's Perspective
,
2013,
Computing in Science & Engineering.
[12]
M. Araújo,et al.
Uses and misuses of bioclimatic envelope modeling.
,
2012,
Ecology.
[13]
Kenton O'Hara,et al.
Troubling Trends in Scientific Software Use
,
2013,
Science.
[14]
Cláudio T. Silva,et al.
Bridging Workflow and Data Provenance Using Strong Links
,
2010,
SSDBM.
[15]
Carl Kesselman,et al.
What makes workflows work in an opportunistic environment?
,
2006,
Concurr. Comput. Pract. Exp..
[16]
Cláudio T. Silva,et al.
Managing Rapidly-Evolving Scientific Workflows
,
2006,
IPAW.
[17]
Leo Breiman,et al.
Random Forests
,
2001,
Machine Learning.