CPGAVAS2, an integrated plastome sequence annotator and analyzer

Abstract We previously developed a web server CPGAVAS for annotation, visualization and GenBank submission of plastome sequences. Here, we upgrade the server into CPGAVAS2 to address the following challenges: (i) inaccurate annotation in the reference sequence likely causing the propagation of errors; (ii) difficulty in the annotation of small exons of genes petB, petD and rps16 and trans-splicing gene rps12; (iii) lack of annotation for other genome features and their visualization, such as repeat elements; and (iv) lack of modules for diversity analysis of plastomes. In particular, CPGAVAS2 provides two reference datasets for plastome annotation. The first dataset contains 43 plastomes whose annotation have been validated or corrected by RNA-seq data. The second one contains 2544 plastomes curated with sequence alignment. Two new algorithms are also implemented to correctly annotate small exons and trans-splicing genes. Tandem and dispersed repeats are identified, whose results are displayed on a circular map together with the annotated genes. DNA-seq and RNA-seq data can be uploaded for identification of single-nucleotide polymorphism sites and RNA-editing sites. The results of two case studies show that CPGAVAS2 annotates better than several other servers. CPGAVAS2 will likely become an indispensible tool for plastome research and can be accessed from http://www.herbalgenomics.org/cpgavas2.

[1]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[2]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[3]  Iain Milne,et al.  Tablet: Visualizing Next-Generation Sequence Assemblies and Mappings. , 2016, Methods in molecular biology.

[4]  Henning Lenz,et al.  Plant organelle RNA editing and its specificity factors: enhancements of analyses and new database features in PREPACT 3.0 , 2018, BMC Bioinformatics.

[5]  M. Sugita,et al.  RNA Editing and Its Molecular Mechanism in Plant Organelles , 2016, Genes.

[6]  Marc Lohse,et al.  OrganellarGenomeDRAW—a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets , 2013, Nucleic Acids Res..

[7]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[8]  M. Morgante,et al.  Polymorphic Simple Sequence Repeats in Nuclear and Chloroplast Genomes: Applications to the Population Genetics of Trees , 1996 .

[9]  Elizabeth A. Kellogg,et al.  Verdant: automated annotation, alignment and phylogenetic analysis of whole chloroplast genomes , 2016, Bioinform..

[10]  M. Fladung,et al.  Differentiation of Populus species using chloroplast single nucleotide polymorphism (SNP) markers--essential for comprehensible and reliable poplar breeding. , 2012, Plant biology.

[11]  Xiaojun Guan,et al.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences , 2012, BMC Genomics.

[12]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[13]  Dean Laslett,et al.  ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. , 2004, Nucleic acids research.

[14]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[15]  M. Yandell,et al.  Genome Annotation and Curation Using MAKER and MAKER‐P , 2014, Current protocols in bioinformatics.

[16]  M. Hanson,et al.  Chloroplast RNA metabolism. , 2010, Annual review of plant biology.

[17]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[18]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[19]  Peter F. Stadler,et al.  tRNAdb 2009: compilation of tRNA sequences and tRNA genes , 2008, Nucleic Acids Res..

[20]  Xin Wei,et al.  PMDBase: a database for studying microsatellite DNA and marker development in plants , 2016, Nucleic Acids Res..

[21]  Robert K. Jansen,et al.  Automatic annotation of organellar genomes with DOGMA , 2004, Bioinform..

[22]  C. Schmitz-Linneweber,et al.  Arabidopsis chloroplast quantitative editotype , 2013, FEBS letters.

[23]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[24]  Axel Fischer,et al.  GeSeq – versatile and accurate annotation of organelle genomes , 2017, Nucleic Acids Res..

[25]  Patricia P. Chan,et al.  tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes , 2016, Nucleic Acids Res..

[26]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[27]  Dan Yang,et al.  Intraspecific and heteroplasmic variations, gene losses and inversions in the chloroplast genome of Astragalus membranaceus , 2016, Scientific Reports.

[28]  Young-Sik Jeong,et al.  AGORA: organellar genome annotation from the amino acid and nucleotide references , 2018, Bioinform..

[29]  H. Daniell,et al.  Chloroplast genomes: diversity, evolution, and applications in genetic engineering , 2016, Genome Biology.

[30]  Ernesto Picardi,et al.  REDIdb 3.0: A Comprehensive Collection of RNA Editing Events in Plant Organellar Genomes , 2018, Front. Plant Sci..

[31]  Y. Vigouroux,et al.  Intra‐individual polymorphism in chloroplasts from NGS data: where does it come from and how to handle it? , 2016, Molecular ecology resources.

[32]  Ernesto Picardi,et al.  REDItools: high-throughput RNA editing detection made easy , 2013, Bioinform..

[33]  Quentin C. B. Cronk,et al.  Plann: A command-line application for annotating plastome sequences1 , 2015, Applications in plant sciences.

[34]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[35]  Hui Zhang,et al.  Identification of Symmetrical RNA Editing Events in the Mitochondria of Salvia miltiorrhiza by Strand-specific RNA Sequencing , 2017, Scientific Reports.

[36]  K. H. Wolfe,et al.  Accelerated evolution of sites undergoing mRNA editing in plant mitochondria and chloroplasts. , 1997, Molecular biology and evolution.

[37]  Uwe Scholz,et al.  MISA-web: a web server for microsatellite prediction , 2017, Bioinform..