GenomeQC: a quality assessment tool for genome assemblies and gene structure annotations

Background Genome assemblies are foundational for understanding the biology of a species. They provide a physical framework for mapping additional sequences, thereby enabling characterization of, for example, genomic diversity and differences in gene expression across individuals and tissue types. Quality metrics for genome assemblies gauge both the completeness and contiguity of an assembly and help provide confidence in downstream biological insights. To compare quality across multiple assemblies, a set of common metrics are typically calculated and then compared to one or more gold standard reference genomes. While several tools exist for calculating individual metrics, applications providing comprehensive evaluations of multiple assembly features are, perhaps surprisingly, lacking. Here, we describe a new toolkit that integrates multiple metrics to characterize both assembly and gene annotation quality in a way that enables comparison across multiple assemblies and assembly types. Findings Our application, named GenomeQC, is an easy-to-use and interactive web framework that integrates various quantitative measures to characterize genome assemblies and annotations. GenomeQC provides researchers with a comprehensive summary of these statistics and allows for benchmarking against gold standard reference assemblies. Conclusions The GenomeQC web application is implemented in R/Shiny version 1.5.9 and Python 3.6 and is freely available at https://genomeqc.maizegdb.org/ under the GPL license. All source code and a containerized version of the GenomeQC pipeline is available in the GitHub repository https://github.com/HuffordLab/GenomeQC.

[1]  Andrey D. Prjibelski,et al.  Icarus: visualizer for de novo assembly evaluation , 2016, Bioinform..

[2]  Lucian Ilie,et al.  LASER: Large genome ASsembly EvaluatoR , 2015, BMC Research Notes.

[3]  M. Lorieux,et al.  Whole Genome Sequencing of Elite Rice Cultivars as a Comprehensive Information Resource for Marker Assisted Selection , 2015, PloS one.

[4]  E. Nabieva,et al.  The miniature genome of a carnivorous plant Genlisea aurea contains a low number of genes and short non-coding sequences , 2013, BMC Genomics.

[5]  Dmitry Antipov,et al.  Versatile genome assembly evaluation with QUAST-LG , 2018, Bioinform..

[6]  Deanna M. Church,et al.  Assembly: a resource for assembled genomes at NCBI , 2015, Nucleic Acids Res..

[7]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[8]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[9]  S. Salzberg,et al.  Sequencing and Assembly of the 22-Gb Loblolly Pine Genome , 2014, Genetics.

[10]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[11]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[12]  Shujun Ou,et al.  Assessing genome assembly quality using the LTR Assembly Index (LAI) , 2018, Nucleic acids research.

[13]  G. Bonnema,et al.  Genome resequencing and comparative variome analysis in a Brassica rapa and Brassica oleracea collection , 2016, Scientific Data.

[14]  M. Berriman,et al.  REAPR: a universal tool for genome assembly evaluation , 2013, Genome Biology.