Phylogenomics from Whole Genome Sequences Using aTRAM

&NA; Novel sequencing technologies are rapidly expanding the size of data sets that can be applied to phylogenetic studies. Currently the most commonly used phylogenomic approaches involve some form of genome reduction. While these approaches make assembling phylogenomic data sets more economical for organisms with large genomes, they reduce the genomic coverage and thereby the long‐term utility of the data. Currently, for organisms with moderate to small genomes (<1000 Mbp) it is feasible to sequence the entire genome at modest coverage (10‐30×). Computational challenges for handling these large data sets can be alleviated by assembling targeted reads, rather than assembling the entire genome, to produce a phylogenomic data matrix. Here we demonstrate the use of automated Target Restricted Assembly Method (aTRAM) to assemble 1107 single‐copy ortholog genes from whole genome sequencing of sucking lice (Anoplura) and out‐groups. We developed a pipeline to extract exon sequences from the aTRAM assemblies by annotating them with respect to the original target protein. We aligned these protein sequences with the inferred amino acids and then performed phylogenetic analyses on both the concatenated matrix of genes and on each gene separately in a coalescent analysis. Finally, we tested the limits of successful assembly in aTRAM by assembling 100 genes from close‐ to distantly related taxa at high to low levels of coverage. Both the concatenated analysis and the coalescent‐based analysis produced the same tree topology, which was consistent with previously published results and resolved weakly supported nodes. These results demonstrate that this approach is successful at developing phylogenomic data sets from raw genome sequencing reads. Further, we found that with coverages above 5‐10×, aTRAM was successful at assembling 80‐90% of the contigs for both close and distantly related taxa. As sequencing costs continue to decline, we expect full genome sequencing will become more feasible for a wider array of organisms, and aTRAM will enable mining of these genomic data sets for an extensive variety of applications, including phylogenomics. [aTRAM; gene assembly; genome sequencing; phylogenomics.]

[1]  Jeffrey P. Townsend,et al.  A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing , 2016, Nature.

[2]  Kevin P. Johnson,et al.  Two Bacterial Genera, Sodalis and Rickettsia, Associated with the Seal Louse Proechinophthirus fluctus (Phthiraptera: Anoplura) , 2016, Applied and Environmental Microbiology.

[3]  Siavash Mirarab,et al.  Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies , 2016, Molecular biology and evolution.

[4]  Tandy J. Warnow,et al.  ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes , 2015, Bioinform..

[5]  T. Warnow,et al.  Ultra-large alignments using phylogeny-aware profiles , 2015, Genome Biology.

[6]  Kevin P. Johnson,et al.  aTRAM - automated target restricted assembly method: a fast method for assembling loci across divergent taxa from next-generation sequencing data , 2015, BMC Bioinformatics.

[7]  J. Slovin,et al.  Re-annotation of the woodland strawberry (Fragaria vesca) genome , 2015, BMC Genomics.

[8]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[9]  Seán G. Brady,et al.  Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera , 2014, Molecular ecology resources.

[10]  Md. Shamsuzzoha Bayzid,et al.  Whole-genome analyses resolve early branches in the tree of life of modern birds , 2014, Science.

[11]  A. Iafrate,et al.  Anchored multiplex PCR for targeted next-generation sequencing , 2014, Nature Medicine.

[12]  Thomas K. F. Wong,et al.  Phylogenomics resolves the timing and pattern of insect evolution , 2014, Science.

[13]  Julie M. Allen,et al.  Genome Sequence of Candidatus Riesia pediculischaeffi, Endosymbiont of Chimpanzee Lice, and Genomic Comparison of Recently Acquired Endosymbionts from Human and Chimpanzee Lice , 2014, G3: Genes, Genomes, Genetics.

[14]  Tandy J. Warnow,et al.  ASTRAL: genome-scale coalescent-based species tree estimation , 2014, Bioinform..

[15]  J. Wolf,et al.  A field guide to whole-genome sequencing, assembly and annotation , 2014, Evolutionary applications.

[16]  Tandy J. Warnow,et al.  PASTA: Ultra-Large Multiple Sequence Alignment , 2014, RECOMB.

[17]  Julie M. Allen,et al.  Rates of genomic divergence in humans, chimpanzees and their lice , 2014, Proceedings of the Royal Society B: Biological Sciences.

[18]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[19]  Mark Howison,et al.  Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling , 2013, PloS one.

[20]  Felipe Zapata,et al.  Toward a statistically explicit understanding of de novo sequence assembly , 2013, Bioinform..

[21]  Erich Bornberg-Bauer,et al.  Genomic and Morphological Evidence Converge to Resolve the Enigma of Strepsiptera , 2013, Current Biology.

[22]  D. Hillis,et al.  Targeted Enrichment: Maximizing Orthologous Gene Comparisons across Deep Evolutionary Time , 2013, PloS one.

[23]  Evgeny M. Zdobnov,et al.  OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs , 2012, Nucleic Acids Res..

[24]  H. Robertson,et al.  Next-generation phylogenomics using a Target Restricted Assembly Method. , 2013, Molecular phylogenetics and evolution.

[25]  A. Lemmon,et al.  Anchored hybrid enrichment for massively high-throughput phylogenomics. , 2012, Systematic biology.

[26]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[27]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[28]  Q. Cronk,et al.  Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA. , 2012, American journal of botany.

[29]  V. Smith,et al.  Multiple lineages of lice pass through the K–Pg boundary , 2011, Biology Letters.

[30]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[31]  E. Eichler,et al.  Limitations of next-generation genome sequence assembly , 2011, Nature Methods.

[32]  Julie M. Allen,et al.  Evolutionary history of mammalian sucking lice (Phthiraptera: Anoplura) , 2010, BMC Evolutionary Biology.

[33]  Daniel R Zerbino,et al.  Using the Velvet de novo Assembler for Short‐Read Sequencing Technologies , 2010, Current protocols in bioinformatics.

[34]  Evgeny M. Zdobnov,et al.  Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle , 2010, Proceedings of the National Academy of Sciences.

[35]  M. Marra,et al.  Applications of new sequencing technologies for transcriptome analysis. , 2009, Annual review of genomics and human genetics.

[36]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[37]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[38]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[39]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[40]  W. A. Cox,et al.  A Phylogenomic Study of Birds Reveals Their Evolutionary History , 2008, Science.

[41]  Timothy B. Stockwell,et al.  Mechanism of chimera formation during the Multiple Displacement Amplification reaction , 2007, BMC biotechnology.

[42]  Steven Salzberg,et al.  Beware of mis-assembled genomes , 2005, Bioinform..

[43]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[44]  Claudine Médigue,et al.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. , 2002, Microbiology.

[45]  D. Clayton,et al.  Nuclear and mitochondrial genes contain similar phylogenetic signal for pigeons and doves (Aves: Columbiformes). , 2000, Molecular phylogenetics and evolution.

[46]  S. Nadler,et al.  Disparate rates of molecular evolution in cospeciating hosts and parasites. , 1994, Science.

[47]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[48]  Ke Chung Kim Evolutionary parallelism in Anoplura and eutherian mammals , 1988 .

[49]  R. Snodgrass The feeding apparatus of biting and sucking insects affecting man and animals , 1944 .