Species Distribution Modeling with Scalability: The Case Study of P-GARP, a Parallel Genetic Algorithm for Rule-Set Production

Species distribution modeling (SDM) calculates a species’ probabilistic distribution by combining Environmental raster layers with species datasets. Such models can help to answer complex questions in Ecology/Biology/Health, e.g., by calculating impacts of climate changes in Biodiversity, or the potential for a disease spread (vectors’ modeling). Machine learning is largely applied in SDM, being the Genetic Algorithm for Rule-set Production (GARP) one of the most reliable solutions. However, GARP’s convergence needs to speedup under certain conditions (high resolution or number of layers), for which this paper proposes P-GARP, a parallel, scalable implementation of GARP. P-GARP was implemented onto a SGI Altix XE 1300 cluster with 2 quad-core processors/node. Preliminary results show an expressive 3.2/node speedup. Premature convergence is not observed in PGARP and its accuracy is very similar to GARP´s. Effective solutions to improve this speedup in even larger scale are proposed, along with a discussion about P-GARP correctness and efficiency.

[1]  A. Peterson,et al.  Predicting the potential invasive distributions of four alien plant species in North America , 2003, Weed Science.

[2]  Fabiana Soares Santana,et al.  A reference business process for ecological niche modelling , 2008, Ecol. Informatics.

[3]  Albert Chan,et al.  PIPE: a protein-protein interaction prediction engine based on the re-occurring short polypeptide sequences between known interacting protein pairs , 2006, BMC Bioinformatics.

[4]  R. B. Jackson,et al.  Global biodiversity scenarios for the year 2100. , 2000, Science.

[5]  Afonso Ferreira,et al.  Efficient Parallel Graph Algorithms for Coarse-Grained Multicomputers and BSP , 2002, Algorithmica.

[6]  A. Peterson,et al.  Modelling spatial patterns of biodiversity for conservation prioritization in North‐eastern Mexico , 2004 .

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[8]  JOHN SADOWSKY Guest editor's introduction , 1983, Comput. Graph..

[9]  James E. Baker,et al.  Reducing Bias and Inefficienry in the Selection Algorithm , 1987, ICGA.

[10]  Marc Martijn Lankhorst Genetic algorithms in data analysis , 1996 .

[11]  David R. B. Stockwell,et al.  The GARP modelling system: problems and solutions to automated spatial prediction , 1999, Int. J. Geogr. Inf. Sci..

[12]  A. Peterson,et al.  Highly Pathogenic H5N1 Avian Influenza: Entry Pathways into North America via Bird Migration , 2007, PloS one.

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Robert P. Anderson,et al.  Evaluating predictive models of species’ distributions: criteria for selecting optimal models , 2003 .

[15]  Andrew Rau-Chaplin,et al.  Scalable parallel computational geometry for coarse grained multicomputers , 1996, Int. J. Comput. Geom. Appl..

[16]  Arthur D Chapman,et al.  Environmental Information: Placing Biodiversity Phenomena in an Ecological and Environmental Context , 2005 .

[17]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[18]  David R. B. Stockwell,et al.  Future projections for Mexican faunas under global climate change scenarios , 2002, Nature.