PACVr: plastome assembly coverage visualization in R

Background The circular, quadripartite structure of plastid genomes which includes two inverted repeat regions renders the automatic assembly of plastid genomes challenging. The correct assembly of plastid genomes is a prerequisite for the validity of subsequent analyses on plastid genome structure and evolution. Plastome-based phylogenetic or population genetic investigations, for example, require the precise identification of DNA sequence and length to determine the location of nucleotide polymorphisms. The average coverage depth of a genome assembly is often used as an indicator for assembly quality. Visualizing coverage depth across a draft genome allows users to inspect the quality of the assembly and, where applicable, identify regions of reduced assembly confidence. Based on such visualizations, users can conduct a local re-assembly or other forms of targeted error correction. Few, if any, contemporary software tools can visualize the coverage depth of a plastid genome assembly while taking its quadripartite structure into account, despite the interplay between genome structure and assembly quality. A software tool is needed that visualizes the coverage depth of a plastid genome assembly on a circular, quadripartite map of the plastid genome. Results We introduce ‘PACVr’, an R package that visualizes the coverage depth of a plastid genome assembly in relation to the circular, quadripartite structure of the genome as well as to the individual plastome genes. The tool allows visualizations on different scales using a variable window approach and also visualizes the equality of gene synteny in the inverted repeat regions of the plastid genome, thus providing an additional measure of assembly quality. As a tool for plastid genomics, PACVr provides the functionality to identify regions of coverage depth above or below user-defined threshold values and helps to identify non-identical IR regions. To allow easy integration into bioinformatic workflows, PACVr can be directly invoked from a Unix shell, thus facilitating its use in automated quality control. We illustrate the application of PACVr on two empirical datasets and compare the resulting visualizations with alternative software tools for displaying plastome sequencing coverage. Conclusions PACVr provides a user-friendly tool to visualize (a) the coverage depth of a plastid genome assembly on a circular, quadripartite plastome map and in relation to individual plastome genes, and (b) the equality of gene synteny in the inverted repeat regions. It, thus, contributes to optimizing plastid genome assemblies and increasing the reliability of publicly available plastome sequences, especially in light of incongruence among the visualization results of alternative software tools. The software, example datasets, technical documentation, and a tutorial are available with the package at https://github.com/michaelgruenstaeudl/PACVr.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Michael Gruenstaeudl,et al.  Bioinformatic Workflows for Generating Complete Plastid Genome Sequences—An Example from Cabomba (Cabombaceae) in the Context of the Phylogenomic Analysis of the Water-Lily Clade , 2018, Life.

[3]  Tracey A Ruhlman,et al.  Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure. , 2017, American journal of botany.

[4]  Yu Song,et al.  GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes , 2019 .

[5]  Kai F. Müller,et al.  The evolution of the plastid chromosome in land plants: gene content, gene order, gene function , 2011, Plant Molecular Biology.

[6]  Robert K. Jansen,et al.  Mimosoid legume plastome evolution: IR expansion, tandem repeat expansions, and accelerated rate of evolution in clpP , 2015, Scientific Reports.

[7]  E. Kellogg,et al.  Polyphyly of Arundinoideae (Poaceae) and evolution of the twisted geniculate lemma awn , 2017, Annals of botany.

[8]  Yeisoo Yu,et al.  Uncovering the novel characteristics of Asian honey bee, Apis cerana, by whole genome sequencing , 2015, BMC Genomics.

[9]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[10]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[11]  Alexandros Stamatakis,et al.  The State of Software for Evolutionary Biology , 2018, Molecular biology and evolution.

[12]  Min Zhang,et al.  Semaphorin3A induces nerve regeneration in the adult cornea-a switch from its repulsive role in development , 2018, PloS one.

[13]  Marc Lohse,et al.  OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets , 2013, Nucleic Acids Res..

[14]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[15]  Robert K Jansen,et al.  Sources of inversion variation in the small single copy (SSC) region of chloroplast genomes. , 2015, American journal of botany.

[16]  R. W. Ness,et al.  Strategies for complete plastid genome sequencing , 2016, Molecular ecology resources.

[17]  C. Ponting,et al.  Sequencing depth and coverage: key considerations in genomic analyses , 2014, Nature Reviews Genetics.

[18]  Jeffrey D. Palmer,et al.  Chloroplast DNA exists in two orientations , 1983, Nature.

[19]  Tracey A Ruhlman,et al.  The plastid genomes of flowering plants. , 2014, Methods in molecular biology.

[20]  Brent S. Pedersen,et al.  Mosdepth: quick coverage calculation for genomes and exomes , 2017, bioRxiv.

[21]  Monique Turmel,et al.  Divergent copies of the large inverted repeat in the chloroplast genomes of ulvophycean green algae , 2017, Scientific Reports.

[22]  Markus J. Ankenbrand,et al.  chloroExtractor: extraction and assembly of the chloroplast genome from whole genome shotgun data , 2018, J. Open Source Softw..

[23]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[24]  Wen-Bin Yu,et al.  GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes , 2018, Genome Biology.

[25]  Y. Vigouroux,et al.  Intra‐individual polymorphism in chloroplasts from NGS data: where does it come from and how to handle it? , 2016, Molecular ecology resources.

[26]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[27]  M. Pop,et al.  Sequence assembly demystified , 2013, Nature Reviews Genetics.

[28]  Mingai Li,et al.  Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats , 2015, BMC Genomics.

[29]  P. Poczai,et al.  The chloroplast genome sequence of bittersweet (Solanum dulcamara): Plastid genome structure evolution in Solanaceae , 2018, PloS one.

[30]  Robert K. Jansen,et al.  Plastome Phylogenetics: 30 Years of Inferences Into Plant Evolution , 2018 .

[31]  Thomas Borsch,et al.  Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice , 2014, PloS one.

[32]  Qiaolin Ye,et al.  Organellar genome assembly methods and comparative analysis of horticultural plants , 2018, Horticulture Research.

[33]  Jun Yu,et al.  Plastid-LCGbase: a collection of evolutionarily conserved plastid-associated gene pairs , 2014, Nucleic Acids Res..

[34]  Benjamin Kilian,et al.  Dated tribe-wide whole chloroplast genome phylogeny indicates recurrent hybridizations within Triticeae , 2017, BMC Evolutionary Biology.

[35]  Jacqueline A. Keane,et al.  Circlator: automated circularization of genome assemblies using long sequencing reads , 2015, Genome Biology.

[36]  Ting Wang,et al.  Plastid genome sequencing, comparative genomics, and phylogenomics: Current status and prospects , 2010 .

[37]  Hongen Zhang,et al.  RCircos: an R package for Circos 2D track plots , 2013, BMC Bioinformatics.

[38]  Julian Tonti-Filippini,et al.  What can we do with 1000 plastid genomes? , 2017, The Plant journal : for cell and molecular biology.

[39]  Jeffrey P. Mower,et al.  Evolutionary dynamics of the plastid inverted repeat: the effects of expansion, contraction, and loss on substitution rates. , 2016, The New phytologist.

[40]  Naoki Sato,et al.  GenoMap, a circular genome data viewer , 2003, Bioinform..

[41]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[42]  David Edwards,et al.  Chloroplast genomics: expanding resources for an evolutionary conserved miniature molecule with enigmatic applications. , 2016 .

[43]  Ralph Bock,et al.  OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes , 2019 .

[44]  Rens Holmer,et al.  Herbarium genomics: plastome sequence assembly from a range of herbarium specimens using an Iterative Organelle Genome Assembly pipeline , 2016 .

[45]  S. Oliver,et al.  Estimating the total number of phosphoproteins and phosphorylation sites in eukaryotic proteomes , 2017, GigaScience.

[46]  Jieru Xie,et al.  Accessibility and Update Status of Published Software: Benefits and Missed Opportunities , 2017, Front. Res. Metr. Anal..

[47]  Zhiqiang Wu,et al.  Are Differences in Genomic Data Sets due to True Biological Variants or Errors in Genome Assembly: An Example from Two Chloroplast Genomes , 2015, PloS one.

[48]  Jeffrey P. Mower,et al.  Structural Diversity Among Plastid Genomes of Land Plants , 2018 .

[49]  Richard G. F. Visser,et al.  De Novo Assembly of Complete Chloroplast Genomes from Non-model Species Based on a K-mer Frequency-Based Selection of Chloroplast Reads from Total DNA Sequences , 2017, Front. Plant Sci..

[50]  Ian Small,et al.  The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene , 2015, PloS one.

[51]  J. Wolf,et al.  A field guide to whole-genome sequencing, assembly and annotation , 2014, Evolutionary applications.

[52]  Joshua A Udall,et al.  Is It Ordered Correctly? Validating Genome Assemblies by Optical Mapping[OPEN] , 2017, Plant Cell.

[53]  Jonathan Crabtree,et al.  Circleator: flexible circular visualization of genome-associated data with BioPerl and SVG , 2014, Bioinform..

[54]  Faramarz Valafar,et al.  Insect small nuclear RNA gene promoters evolve rapidly yet retain conserved features involved in determining promoter activity and RNA polymerase specificity , 2006, Nucleic acids research.

[55]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[56]  Ian Small,et al.  Correction: The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene , 2015, PloS one.

[57]  Zhihua Liu,et al.  CGAP: a new comprehensive platform for the comparative analysis of chloroplast genomes , 2013, BMC Bioinformatics.

[58]  Pamela S Soltis,et al.  Plastid phylogenomic analysis of green plants: A billion years of evolutionary history. , 2018, American journal of botany.

[59]  Zhoujun Li,et al.  Development and implementation of CARAS algorithm for automatic annotation, visualization, and GenBank submission of chloroplast genome sequences , 2012, 2012 Computing, Communications and Applications Conference.

[60]  Jeffrey P. Mower,et al.  Variable presence of the inverted repeat and plastome stability in Erodium. , 2016, Annals of botany.

[61]  R B Hallick,et al.  Trans-splicing in chloroplasts: the rps 12 loci of Nicotiana tabacum. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[62]  James Howison,et al.  Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature , 2016, J. Assoc. Inf. Sci. Technol..

[63]  Elizabeth A. Kellogg,et al.  Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes , 2016, Bioinform..

[64]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[65]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[66]  W. J. Kent,et al.  The UCSC Genome Browser , 2003, Current protocols in bioinformatics.

[67]  Maria S Vorontsova,et al.  A 250 plastome phylogeny of the grass family (Poaceae): topological support under different data partitions , 2018, PeerJ.

[68]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[69]  Michael R McKain,et al.  Practical considerations for plant phylogenomics , 2018, Applications in plant sciences.

[70]  Patrick Mardulyn,et al.  NOVOPlasty: de novo assembly of organelle genomes from whole genome data. , 2016, Nucleic acids research.

[71]  Michael P. Snyder,et al.  Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures , 2014, Bioinform..

[72]  Kenneth H. Wolfe,et al.  GenomeVx: simple web-based creation of editable circular chromosome maps , 2008, Bioinform..

[73]  Brent S. Pedersen,et al.  Indexcov: fast coverage quality control for whole-genome sequencing , 2017, bioRxiv.

[74]  Tae-Ho Lee,et al.  GBParsy: A GenBank flatfile parser library with high speed , 2008, BMC Bioinformatics.

[75]  Sang-Choon Lee,et al.  Complete chloroplast and ribosomal sequences for 30 accessions elucidate evolution of Oryza AA genome species , 2015, Scientific Reports.

[76]  S. Lonardi,et al.  A comparative evaluation of genome assembly reconciliation tools , 2017, Genome Biology.

[77]  Michael Gruenstaeudl,et al.  Plastid genome structure and phylogenomics of Nymphaeales: conserved gene order and new insights into relationships , 2017, Plant Systematics and Evolution.

[78]  Véronique Martin,et al.  Mapping Reads on a Genomic Sequence: An Algorithmic Overview and a Practical Comparative Analysis , 2012, J. Comput. Biol..

[79]  En-Hua Xia,et al.  Contradiction between Plastid Gene Transcription and Function Due to Complex Posttranscriptional Splicing: An Exemplary Study of ycf15 Function and Evolution in Angiosperms , 2013, PloS one.

[80]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[81]  David S. Wishart,et al.  Circular genome visualization and exploration using CGView , 2005, Bioinform..

[82]  Robert K. Jansen,et al.  Aberration or Analogy? The Atypical Plastomes of Geraniaceae , 2018 .

[83]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.