Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics1

Premise of the study: Hyb-Seq, the combination of target enrichment and genome skimming, allows simultaneous data collection for low-copy nuclear genes and high-copy genomic targets for plant systematics and evolution studies. Methods and Results: Genome and transcriptome assemblies for milkweed (Asclepias syriaca) were used to design enrichment probes for 3385 exons from 768 genes (>1.6 Mbp) followed by Illumina sequencing of enriched libraries. Hyb-Seq of 12 individuals (10 Asclepias species and two related genera) resulted in at least partial assembly of 92.6% of exons and 99.7% of genes and an average assembly length >2 Mbp. Importantly, complete plastomes and nuclear ribosomal DNA cistrons were assembled using off-target reads. Phylogenomic analyses demonstrated signal conflict between genomes. Conclusions: The Hyb-Seq approach enables targeted sequencing of thousands of low-copy nuclear exons and flanking regions, as well as genome skimming of high-copy repeats and organellar genomes, to efficiently produce genome-scale data sets for phylogenomics.

[1]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[2]  M. Chapman,et al.  Universal markers for comparative mapping and phylogenetic analysis in the Asteraceae (Compositae) , 2007, Theoretical and Applied Genetics.

[3]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[4]  Antonis Rokas,et al.  Inferring ancient divergences requires genes with strong phylogenetic signals , 2013, Nature.

[5]  A. Lemmon,et al.  High-Throughput Genomic Data in Systematics and Phylogenetics , 2013 .

[6]  A. Liston,et al.  Separating the wheat from the chaff: mitigating the effects of noise in a plastome phylogenomic data set from Pinus L. (Pinaceae) , 2012, BMC Evolutionary Biology.

[7]  Liang Liu,et al.  STRAW: Species TRee Analysis Web server , 2013, Nucleic Acids Res..

[8]  E. Pahlich,et al.  A rapid DNA isolation procedure for small quantities of fresh leaf tissue , 1980 .

[9]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[10]  Jim Leebens-Mack,et al.  Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels , 2010, BMC Evolutionary Biology.

[11]  C. Sensen,et al.  Expressed sequence tags from Madagascar periwinkle (Catharanthus roseus) , 2006, FEBS letters.

[12]  Deren A. R. Eaton,et al.  Inferring Phylogeny and Introgression using RADseq Data: An Example from Flowering Plants (Pedicularis: Orobanchaceae) , 2013, Systematic biology.

[13]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[14]  Burke,et al.  A target enrichment method for gathering phylogenetic information from hundreds of loci: An example from the Compositae1 , 2014, Applications in Plant Sciences.

[15]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[16]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[17]  S. P. Lynch,et al.  Phylogenetic Relationships of Asclepias (Apocynaceae) Inferred from Non-Coding Chloroplast DNA Sequences , 2011 .

[18]  Mark Fishbein,et al.  Navigating the tip of the genomic iceberg: Next-generation sequencing for plant systematics. , 2012, American journal of botany.

[19]  J. Wen,et al.  Reprint of: using nuclear gene data for plant phylogenetics: progress and prospects. , 2013, Molecular phylogenetics and evolution.

[20]  Travis C. Glenn,et al.  A Phylogeny of Birds Based on Over 1,500 Loci Collected by Target Enrichment and High-Throughput Sequencing , 2012, PloS one.

[21]  J. Doyle,et al.  A rapid DNA isolation procedure for small amounts of fresh leaf tissue , 1987 .

[22]  C. Shyu,et al.  Long identical multispecies elements in plant and animal genomes , 2012, Proceedings of the National Academy of Sciences.

[23]  C. Buell,et al.  Development of Transcriptomic Resources for Interrogating the Biosynthesis of Monoterpene Indole Alkaloids in Medicinal Plant Species , 2012, PloS one.

[24]  J. Wen,et al.  Using nuclear gene data for plant phylogenetics: progress and prospects. , 2012, Molecular phylogenetics and evolution.

[25]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[26]  A. Liston,et al.  Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing , 2011, BMC Genomics.

[27]  Aakrosh Ratan,et al.  Assembly algorithms for next-generation sequence data , 2009 .

[28]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[29]  L. Kubatko,et al.  Inconsistency of phylogenetic estimates from concatenated data under coalescence. , 2007, Systematic biology.

[30]  Scott V Edwards,et al.  A maximum pseudo-likelihood approach for estimating species trees under the coalescent model , 2010, BMC Evolutionary Biology.

[31]  P. J. Maughan,et al.  Targeted enrichment strategies for next-generation plant biology. , 2012, American journal of botany.

[32]  Jacob A. Tennessen,et al.  Targeted Sequence Capture Provides Insight into Genome Structure and Genetics of Male Sterility in a Gynodioecious Diploid Strawberry, Fragaria vesca ssp. bracteata (Rosaceae) , 2013, G3: Genes, Genomes, Genetics.

[33]  S. Tanksley,et al.  Combining Bioinformatics and Phylogenetics to Identify Large Sets of Single-Copy Orthologous Genes (COSII) for Comparative, Evolutionary and Systematic Studies: A Test Case in the Euasterid Plant Clade , 2006, Genetics.

[34]  Claude W. dePamphilis,et al.  Ancestral polyploidy in seed plants and angiosperms , 2011, Nature.

[35]  M. Hofreiter,et al.  Capturing protein-coding genes across highly divergent species. , 2013, BioTechniques.

[36]  Gregory W. Stull,et al.  A targeted enrichment strategy for massively parallel sequencing of angiosperm plastid genomes , 2013, Applications in plant sciences.

[37]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..