Allele Phasing Greatly Improves the Phylogenetic Utility of Ultraconserved Elements

Advances in high-throughput sequencing techniques now allow relatively easy and affordable sequencing of large portions of the genome, even for non-model organisms. Many phylogenetic studies reduce costs by focusing their sequencing efforts on a selected set of targeted loci, commonly enriched using sequence capture. The advantage of this approach is that it recovers a consistent set of loci, each with high sequencing depth, which leads to more confidence in the assembly of target sequences. High sequencing depth can also be used to identify phylogenetically informative allelic variation within sequenced individuals, but allele sequences are infrequently assembled in phylogenetic studies. Instead, many scientists perform their phylogenetic analyses using contig sequences which result from the de novo assembly of sequencing reads into contigs containing only canonical nucleobases, and this may reduce both statistical power and phylogenetic accuracy. Here, we develop an easy-to-use pipeline to recover allele sequences from sequence capture data, and we use simulated and empirical data to demonstrate the utility of integrating these allele sequences to analyses performed under the Multispecies Coalescent (MSC) model. Our empirical analyses of Ultraconserved Element (UCE) locus data collected from the South American hummingbird genus Topaza demonstrate that phased allele sequences carry sufficient phylogenetic information to infer the genetic structure, lineage divergence, and biogeographic history of a genus that diversified during the last three million years. The phylogenetic results support the recognition of two species, and suggest a high rate of gene flow across large distances of rainforest habitats but rare admixture across the Amazon River. Our simulations provide evidence that analyzing allele sequences leads to more accurate estimates of tree topology and divergence times than the more common approach of using contig sequences.

[1]  G. Kirwan,et al.  Crimson Topaz (Topaza pella) , 2020 .

[2]  N. Collar,et al.  Fiery Topaz (Topaza pyra) , 2020, Birds of the World.

[3]  M. Suchard,et al.  Posterior summarisation in Bayesian phylogenetics using Tracer , 2022 .

[4]  Adam D. Leaché,et al.  The Utility of Single Nucleotide Polymorphism (SNP) Data in Phylogenetics , 2017 .

[5]  B. Pfeil,et al.  A cryptic species produced by autopolyploidy and subsequent introgression involving Medicago prostrata (Fabaceae). , 2017, Molecular phylogenetics and evolution.

[6]  Graham Jones,et al.  Algorithmic improvements to species delimitation and phylogeny estimation under the multispecies coalescent , 2017, Journal of mathematical biology.

[7]  Nicola De Maio,et al.  Reversible polymorphism-aware phylogenetic models and their application to tree inference. , 2016, Journal of theoretical biology.

[8]  B. Faircloth,et al.  Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods. , 2016, Systematic biology.

[9]  K. Burns,et al.  Comparison of Target-Capture and Restriction-Site Associated DNA Sequencing for Phylogenomics: A Test in Cardinalid Tanagers (Aves, Genus: Piranga). , 2016, Systematic biology.

[10]  Travis C Glenn,et al.  Sequence Capture versus Restriction Site Associated DNA Sequencing for Shallow Systematics. , 2013, Systematic biology.

[11]  Brant C. Faircloth,et al.  PHYLUCE is a software package for the analysis of conserved genomic loci , 2015, bioRxiv.

[12]  Ziheng Yang The BPP program for species tree estimation and species delimitation , 2015 .

[13]  Jacob A. Esselstyn,et al.  The Challenges of Resolving a Rapid, Recent Radiation: Empirical and Simulated Phylogenomics of Philippine Shrews. , 2015, Systematic biology.

[14]  Alexandros Stamatakis,et al.  Short Tree, Long Tree, Right Tree, Wrong Tree: New Acquisition Bias Corrections for Inferring SNP Phylogenies , 2015, Systematic biology.

[15]  Mark J. Clement,et al.  Heterozygous genome assembly via binary classification of homologous sequence , 2015, BMC Bioinformatics.

[16]  A. Aleixo,et al.  Cryptic speciation in the white-shouldered antshrike (Thamnophilus aethiops, Aves - Thamnophilidae): the tale of a transcontinental radiation across rivers in lowland Amazonia and the northeastern Atlantic Forest. , 2015, Molecular phylogenetics and evolution.

[17]  Graham Jones,et al.  DISSECT: an assignment-free Bayesian discovery method for species delimitation under the multispecies coalescent , 2014, bioRxiv.

[18]  John Gatesy,et al.  Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. , 2014, Molecular phylogenetics and evolution.

[19]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[20]  John Gatesy,et al.  Land plant origins and coalescence confusion. , 2014, Trends in plant science.

[21]  R. Dudley,et al.  Molecular Phylogenetics and the Diversification of Hummingbirds , 2014, Current Biology.

[22]  L. Excoffier,et al.  Ignoring heterozygous sites biases phylogenomic estimates of divergence times: implications for the evolutionary history of microtus voles. , 2014, Molecular biology and evolution.

[23]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[24]  Dong Xie,et al.  BEAST 2: A Software Platform for Bayesian Evolutionary Analysis , 2014, PLoS Comput. Biol..

[25]  B. Faircloth,et al.  Target capture and massively parallel sequencing of ultraconserved elements for comparative studies at shallow evolutionary time scales. , 2013, Systematic biology.

[26]  T. Hedderson,et al.  Constructing phylogenies in the presence of intra-individual site polymorphisms (2ISPs) with a focus on the nuclear ribosomal cistron. , 2014, Systematic biology.

[27]  Paul D. Shaw,et al.  Using Tablet for visual exploration of second-generation sequencing data , 2013, Briefings Bioinform..

[28]  B. Faircloth,et al.  A Phylogenomic Perspective on the Radiation of Ray-Finned Fishes Based upon Targeted Sequencing of Ultraconserved Elements (UCEs) , 2012, PloS one.

[29]  C. C. Ribas,et al.  Handbook of the Birds of the World , 1997, El Hornero.

[30]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[31]  M. Wink,et al.  Phylogeography of the chestnut‐tailed antbird (Myrmeciza hemimelaena) clarifies the role of rivers in Amazonian biogeography , 2012 .

[32]  Nicholas G. Crawford,et al.  LSU Digital Commons LSU Digital Commons Ultraconserved elements are novel phylogenomic markers that Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with resolve placental mammal phylogeny when combined with species-tree analysis species-tr , 2022 .

[33]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[34]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[35]  David Bryant,et al.  Next-generation sequencing reveals phylogeographic structure and a species tree for recent bird divergences. , 2009, Molecular phylogenetics and evolution.

[36]  J. Cracraft,et al.  A palaeobiogeographic model for biotic diversification within Amazonia over the past three million years , 2012, Proceedings of the Royal Society B: Biological Sciences.

[37]  M. Meyer,et al.  Multilocus Resolution of Phylogeny and Timescale in the Extant Adaptive Radiation of Hawaiian Honeycreepers , 2011, Current Biology.

[38]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[39]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[40]  Eleazar Eskin,et al.  Optimal algorithms for haplotype assembly from whole-genome sequence data , 2010, Bioinform..

[41]  P. Sunnucks,et al.  Nuclear gene phylogeography using PHASE: dealing with unresolved genotypes, lost alleles, and systematic bias in parameter estimation , 2010, BMC Evolutionary Biology.

[42]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[43]  Brian L. Sullivan,et al.  eBird: A citizen-based bird observation network in the biological sciences , 2009 .

[44]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[45]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[46]  Noah A Rosenberg,et al.  Gene tree discordance, phylogenetic inference and the multispecies coalescent. , 2009, Trends in ecology & evolution.

[47]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[48]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.

[49]  Kazutaka Katoh,et al.  Multiple alignment of DNA sequences with MAFFT. , 2009, Methods in molecular biology.

[50]  I. Lovette,et al.  Experimental evidence for extreme dispersal limitation in tropical forest birds. , 2008, Ecology letters.

[51]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[52]  O. Ryder,et al.  Analysis of complete mitochondrial genome sequences increases phylogenetic resolution of bears (Ursidae), a mammalian family that experienced rapid speciation , 2007, BMC Evolutionary Biology.

[53]  D. Pearl,et al.  High-resolution species trees without concatenation , 2007, Proceedings of the National Academy of Sciences.

[54]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[55]  Eric Vigoda,et al.  Phylogenetic MCMC Algorithms Are Misleading on Mixtures of Trees , 2005, Science.

[56]  Robert K. Jansen,et al.  Automatic annotation of organellar genomes with DOGMA , 2004, Bioinform..

[57]  F. E. Hayes,et al.  The Amazon River as a dispersal barrier to passerine birds: effects of river width, habitat and taxonomy , 2004 .

[58]  Bryan Kolaczkowski,et al.  Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous , 2004, Nature.

[59]  C. C. Clair Comparative Permeability of Roads, Rivers, and Meadows to Songbirds in Banff National Park , 2003 .

[60]  Ziheng Yang,et al.  Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. , 2003, Genetics.

[61]  L. Joseph,et al.  DISTRIBUTION, VARIATION, AND TAXONOMY OF TOPAZA HUMMINGBIRDS (AVES: TROCHILIDAE) , 2000 .

[62]  R. W. Russell,et al.  The Impact of Variation in Stopover Habitat Quality on Migrant Rufous Hummingbirds , 1994 .

[63]  J. V. Remsen,et al.  Contribution of River-Created Habitats to Bird Species Richness in Amazonia , 1983 .

[64]  Check-List of Birds of the World , 1931, Nature.