Evaluating Genome Assemblies and Gene Models Using gVolante.

In daily practice of de novo genome assembly and gene prediction, it would be a natural urge to evaluate their products. Different programs and parameter settings give rise to variable outputs, which leaves a decision of which output to adopt for downstream analysis for addressing biological questions. Instead of superficial assessment of length-based statistics of output sequences (e.g., N50 scaffold length), completeness assessment by means of scoring the coverage of reference orthologs has been increasingly utilized.We previously launched a web service, gVolante ( https://gvolante.riken.jp /), to provide a user-friendly interface and a uniform environment for completeness assessment with the pipelines CEGMA and BUSCO. Completeness assessments performed on gVolante report scores based on not just the coverage of reference genes but also on sequence lengths, allowing quality control in multiple aspects. This chapter focuses on the procedure for such assessment and provides technical tips for higher accuracy.

[1]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[2]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[3]  M. Kadota,et al.  Shark genomes provide insights into elasmobranch evolution and the origin of vertebrates , 2018, Nature Ecology & Evolution.

[4]  Yuichiro Hara,et al.  Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation , 2015, BMC Genomics.

[5]  Inanç Birol,et al.  Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species , 2013, GigaScience.

[6]  Osamu Nishimura,et al.  aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity , 2013, Nucleic Acids Res..

[7]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[8]  Mark Yandell,et al.  The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution , 2018, Nature Genetics.

[9]  K. Vandepoele,et al.  Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences[OPEN] , 2016, Plant Cell.

[10]  Madagascar ground gecko genome analysis characterizes asymmetric fates of duplicated genes , 2018, BMC Biology.

[11]  Keith Bradnam,et al.  Assessing the gene space in draft genomes , 2008, Nucleic acids research.

[12]  Mitsutaka Kadota,et al.  CTCF binding landscape in jawless fish with reference to Hox cluster evolution , 2017, Scientific Reports.

[13]  Osamu Nishimura,et al.  gVolante for standardizing completeness assessment of genome and transcriptome assemblies , 2017, Bioinform..

[14]  Jianzhi Zhang,et al.  Population Genomic Analysis Reveals Contrasting Demographic Changes of Two Closely Related Dolphin Species in the Last Glacial , 2017, bioRxiv.

[15]  Frédéric Delsuc,et al.  Phylotranscriptomic consolidation of the jawed vertebrate timetree , 2017, Nature Ecology & Evolution.