Phylogeography by diffusion on a sphere: whole world phylogeography

Background Techniques for reconstructing geographical history along a phylogeny can answer many questions of interest about the geographical origins of species. Bayesian models based on the assumption that taxa move through a diffusion process have found many applications. However, these methods rely on diffusion processes on a plane, and do not take the spherical nature of our planet in account. Performing an analysis that covers the whole world thus does not take in account the distortions caused by projections like the Mercator projection. Results In this paper, we introduce a Bayesian phylogeographical method based on diffusion on a sphere. When the area where taxa are sampled from is small, a sphere can be approximated by a plane and the model results in the same inferences as with models using diffusion on a plane. For taxa sampled from the whole world, we obtain substantial differences. We present an efficient algorithm for performing inference in a Markov Chain Monte Carlo (MCMC) algorithm, and show applications to small and large samples areas. We compare results between planar and spherical diffusion in a simulation study and apply the method by inferring the origin of Hepatitis B based on sequences sampled from Eurasia and Africa. Conclusions We describe a framework for performing phylogeographical inference, which is suitable when the distortion introduced by map projections is large, but works well on a smaller scale as well. The framework allows sampling tips from regions, which is useful when the exact sample location is unknown, and placing prior information on locations of clades in the tree. The method is implemented in the GEO_SPHERE package in BEAST 2, which is open source licensed under LGPL and allows joint tree and geography inference under a wide range of models.

[1]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[2]  Nando de Freitas,et al.  An Introduction to Sequential Monte Carlo Methods , 2001, Sequential Monte Carlo Methods in Practice.

[3]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[4]  M. Suchard,et al.  On the biogeography of Centipeda: a species-tree diffusion approach. , 2014, Systematic biology.

[5]  Alexei J. Drummond,et al.  Computational statistical inference for molecular evolution and population genetics. , 2002 .

[6]  R. Hudson Gene genealogies and the coalescent process. , 1990 .

[7]  Simon J. Greenhill,et al.  Mapping the Origins and Expansion of the Indo-European Language Family , 2012, Science.

[8]  James H. Brown,et al.  Foundations of Biogeography: Classic Papers with Commentaries , 2005 .

[9]  Marc A. Suchard,et al.  SPREAD: spatial phylogenetic reconstruction of evolutionary dynamics , 2011, Bioinform..

[10]  M. Suchard,et al.  The early spread and epidemic ignition of HIV-1 in human populations , 2014, Science.

[11]  Alexei J. Drummond,et al.  Bayesian Phylogeography Finds Its Roots , 2009, PLoS Comput. Biol..

[12]  S. Ho,et al.  Dating the origin and dispersal of hepatitis B virus infection in humans and primates , 2013, Hepatology.

[13]  Nicola De Maio,et al.  New Routes to Phylogeography: A Bayesian Structured Coalescent Approximation , 2015, PLoS genetics.

[14]  M. Suchard,et al.  Phylogeography takes a relaxed random walk in continuous space and time. , 2010, Molecular biology and evolution.

[15]  M. Suchard,et al.  Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. , 2012, Molecular biology and evolution.

[16]  S. Sinha,et al.  A “Gaussian” for diffusion on the sphere , 2012, 1303.1278.

[17]  S. Ho,et al.  Relaxed Phylogenetics and Dating with Confidence , 2006, PLoS biology.

[18]  Forrest W. Crawford,et al.  Unifying the spatial epidemiology and molecular evolution of emerging epidemics , 2012, Proceedings of the National Academy of Sciences.

[19]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[20]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[21]  David Welch,et al.  Efficient Bayesian inference under the structured coalescent , 2014, Bioinform..

[22]  Remco R. Bouckaert,et al.  DensiTree 2: Seeing Trees Through the Forest , 2014, bioRxiv.

[23]  Remco Bouckaert,et al.  Evolutionary Rates and Hbv: Issues of Rate Estimation with Bayesian Molecular Methods , 2013, Antiviral therapy.