SpeciesGeoCoder: Fast Categorization of Species Occurrences for Analyses of Biodiversity, Biogeography, Ecology, and Evolution

&NA; Understanding the patterns and processes underlying the uneven distribution of biodiversity across space constitutes a major scientific challenge in systematic biology and biogeography, which largely relies on effectively mapping and making sense of rapidly increasing species occurrence data. There is thus an urgent need for making the process of coding species into spatial units faster, automated, transparent, and reproducible. Here we present SpeciesGeoCoder, an open‐source software package written in Python and R, that allows for easy coding of species into user‐defined operational units. These units may be of any size and be purely spatial (i.e., polygons) such as countries and states, conservation areas, biomes, islands, biodiversity hotspots, and areas of endemism, but may also include elevation ranges. This flexibility allows scoring species into complex categories, such as those encountered in topographically and ecologically heterogeneous landscapes. In addition, SpeciesGeoCoder can be used to facilitate sorting and cleaning of occurrence data obtained from online databases, and for testing the impact of incorrect identification of specimens on the spatial coding of species. The various outputs of SpeciesGeoCoder include quantitative biodiversity statistics, global and local distribution maps, and files that can be used directly in many phylogeny‐based applications for ancestral range reconstruction, investigations of biome evolution, and other comparative methods. Our simulations indicate that even datasets containing hundreds of millions of records can be analyzed in relatively short time using a standard computer. We exemplify the use of SpeciesGeoCoder by inferring the historical dispersal of birds across the Isthmus of Panama, showing that lowland species crossed the Isthmus about twice as frequently as montane species with a marked increase in the number of dispersals during the last 10 million years.

[1]  Carsten Meyer,et al.  Multidimensional biases, gaps and uncertainties in global plant occurrence information. , 2016, Ecology letters.

[2]  K. De Baets,et al.  Tectonic blocks and molecular clocks , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Aaron R. Wood,et al.  First North American fossil monkey and early Miocene tropical biotic interchange , 2016, Nature.

[4]  P. Molnar,et al.  Quaternary glaciation and the Great American Biotic Interchange , 2016 .

[5]  D. Silvestro,et al.  Fossil biogeography: a new model to infer dispersal, extinction and sampling from palaeontological data , 2016, Philosophical Transactions of the Royal Society B: Biological Sciences.

[6]  Mark P. Robertson,et al.  Biogeo: an R package for assessing and improving data quality of occurrence record datasets , 2016 .

[7]  J. Lobo,et al.  Seven Shortfalls that Beset Large-Scale Knowledge of Biodiversity , 2015 .

[8]  D. Harris,et al.  Widespread mistaken identity in tropical plant collections , 2015, Current Biology.

[9]  P. Chakrabarty,et al.  Reply to Lessios and Marko et al.: Early and progressive migration across the Isthmus of Panama is robust to missing data and biases , 2015, Proceedings of the National Academy of Sciences.

[10]  Alexandre Antonelli,et al.  Estimating species diversity and distribution in the era of Big Data: to what extent can we trust public databases? , 2015, Global ecology and biogeography : a journal of macroecology.

[11]  P. Chakrabarty,et al.  Biological evidence supports an early and complex emergence of the Isthmus of Panama , 2015, Proceedings of the National Academy of Sciences.

[12]  D. Silvestro,et al.  An engine for global plant diversity: highest evolutionary turnover and emigration in the American tropics , 2015, Front. Genet..

[13]  A. Forasiepi,et al.  Neotropical mammal diversity and the Great American Biotic Interchange: spatial and temporal variation in South America's fossil record , 2015, Front. Genet..

[14]  C. Jaramillo,et al.  Middle Miocene closure of the Central American Seaway , 2015, Science.

[15]  Alexandre Antonelli,et al.  A network approach for identifying and delimiting biogeographical regions , 2014, Nature Communications.

[16]  N. Matzke,et al.  Model selection in historical biogeography reveals that founder-event speciation is a crucial process in Island Clades. , 2014, Systematic biology.

[17]  Michael J. Landis,et al.  Bayesian analysis of biogeography when the number of areas is large. , 2013, Systematic biology.

[18]  C. Printzen,et al.  Pleistocene expansion of the bipolar lichen Cetraria aculeata into the Southern hemisphere , 2013, Molecular ecology.

[19]  Susanne A. Fritz,et al.  An Update of Wallace’s Zoogeographic Regions of the World , 2013, Science.

[20]  R. FitzJohn Diversitree: comparative phylogenetic analyses of diversification in R , 2012 .

[21]  W. Jetz,et al.  The global diversity of birds in space and time , 2012, Nature.

[22]  V. Funk,et al.  data: Improving the use of information from museum specimens: Using Google Earth© to georeference Guiana Shield specimens in the US National Herbarium , 2012 .

[23]  N. Hoyos,et al.  Arc‐continent collision and orocline formation: Closing of the Central American seaway , 2012 .

[24]  Daniele Silvestro,et al.  A Bayesian framework to estimate diversification rates and their variation through time and space , 2011, BMC Evolutionary Biology.

[25]  Alex Hardisty,et al.  BioVeL: Biodiversity Virtual e-Laboratory , 2011 .

[26]  C. Mora,et al.  How Many Species Are There on Earth and in the Ocean? , 2011, PLoS biology.

[27]  Georgina M. Mace,et al.  Distorted Views of Biodiversity: Spatial and Temporal Bias in Species Occurrence Data , 2010, PLoS biology.

[28]  D. Schluter,et al.  The Great American Biotic Interchange in birds , 2009, Proceedings of the National Academy of Sciences.

[29]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[30]  D. Maddison,et al.  Mesquite: a modular system for evolutionary analysis. Version 2.6 , 2009 .

[31]  G. Allen,et al.  Freshwater Ecoregions of the World: A New Map of Biogeographic Units for Freshwater Biodiversity Conservation , 2008 .

[32]  Richard H. Ree,et al.  Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis. , 2008, Systematic biology.

[33]  Luke J. Harmon,et al.  GEIGER: investigating evolutionary radiations , 2008, Bioinform..

[34]  R. Guralnick,et al.  BioGeomancer: Automated Georeferencing to Map the World's Biodiversity Data , 2006, PLoS biology.

[35]  J. Weir DIVERGENT TIMING AND PATTERNS OF SPECIES ACCUMULATION IN LOWLAND AND HIGHLAND NEOTROPICAL BIRDS , 2006, Evolution; international journal of organic evolution.

[36]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[37]  G. Powell,et al.  Terrestrial Ecoregions of the World: A New Map of Life on Earth , 2001 .

[38]  A. Strid The Flora Hellenica database. , 2000 .

[39]  R. Mittermeier,et al.  Biodiversity hotspots for conservation priorities , 2000, Nature.

[40]  F. G. Stehli,et al.  The great American biotic interchange , 1985 .