Dsuite - fast D-statistics and related admixture evidence from VCF files

Summary The D-statistic, also known as the ABBA-BABA statistic, and related statistics are commonly used to assess evidence of gene flow between populations or closely related species. While the calculations are not computationally intensive, currently available implementations require custom file formats and are impractical to evaluate all gene flow hypotheses across datasets that include many populations or species. Dsuite is a fast C++ implementation, allowing genome scale calculations of the D-statistic across all combinations of tens or even hundreds of populations or species directly from a variant call format (VCF) file. Furthermore, the program can estimate the admixture fraction and provide evidence of whether introgression is confined to specific loci. Thus Dsuite facilitates assessment of gene flow across large genomic datasets. Availability and implementation Source code and documentation are available at: https://github.com/millanek/Dsuite

[1]  Matthew W. Hahn,et al.  Detection and Polarization of Introgression in a Five-taxon Phylogeny , 2014, bioRxiv.

[2]  Anders Eriksson,et al.  Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins , 2012, Proceedings of the National Academy of Sciences.

[3]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[4]  Paul D. Blischak,et al.  HyDe: a Python Package for Genome-Scale Hybridization Detection , 2017, bioRxiv.

[5]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[6]  Swapan Mallick,et al.  Ancient Admixture in Human History , 2012, Genetics.

[7]  Ancestral Hybridization Facilitated Species Diversification in the Lake Malawi Cichlid Fish Adaptive Radiation , 2019, Molecular biology and evolution.

[8]  Simon H. Martin,et al.  Genome-wide evidence for speciation with gene flow in Heliconius butterflies , 2013, Genome research.

[9]  Comp-D: a program for comprehensive computation of D-statistics and population summaries of reticulated evolution , 2019, Conservation Genetics Resources.

[10]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[11]  K. M. Kozak,et al.  Genome-wide admixture is common across the Heliconius radiation , 2018, bioRxiv.

[12]  J. Tung,et al.  The contribution of admixture to primate evolution. , 2017, Current opinion in genetics & development.

[13]  Simon H. Martin,et al.  Evaluating the Use of ABBA–BABA Statistics to Locate Introgressed Loci , 2014, bioRxiv.

[14]  Richard J. Challis,et al.  Genomic islands of speciation separate cichlid ecomorphs in an East African crater lake , 2015, Science.

[15]  Xiaofang Jiang,et al.  Extensive introgression in a malaria vector species complex revealed by phylogenomics , 2015, Science.

[16]  Philip L. F. Johnson,et al.  A Draft Sequence of the Neandertal Genome , 2010, Science.

[17]  Jerome Kelleher,et al.  Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes , 2015, bioRxiv.

[18]  L. Excoffier,et al.  Robust Demographic Inference from Genomic and SNP Data , 2013, PLoS genetics.

[19]  G. Turner,et al.  Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow , 2018, Nature Ecology & Evolution.

[20]  J. Wall,et al.  Whole-genome sequence analysis shows that two endemic species of North American wolf are admixtures of the coyote and gray wolf , 2016, Science Advances.

[21]  David Reich,et al.  Testing for ancient admixture between closely related populations. , 2011, Molecular biology and evolution.