BUSCO: Assessing Genome Assembly and Annotation Completeness.

Genomics drives the current progress in molecular biology, generating unprecedented volumes of data. The scientific value of these sequences depends on the ability to evaluate their completeness using a biologically meaningful approach. Here, we describe the use of the BUSCO tool suite to assess the completeness of genomes, gene sets, and transcriptomes, using their gene content as a complementary method to common technical metrics. The chapter introduces the concept of universal single-copy genes, which underlies the BUSCO methodology, covers the basic requirements to set up the tool, and provides guidelines to properly design the analyses, run the assessments, and interpret and utilize the results.

[1]  Mark Borodovsky,et al.  Eukaryotic Gene Prediction Using GeneMark.hmm‐E and GeneMark‐ES , 2011, Current protocols in bioinformatics.

[2]  N. Nagarajan,et al.  The draft genome of tropical fruit durian (Durio zibethinus) , 2017, Nature Genetics.

[3]  Pawel Herzyk,et al.  De novo transcriptome assembly, annotation and comparison of four ecological and evolutionary model salmonid fish species , 2018, BMC Genomics.

[4]  G. Ast,et al.  Alternative splicing and evolution: diversification, exon definition and function , 2010, Nature Reviews Genetics.

[5]  Evgeny M. Zdobnov,et al.  OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs , 2018, Nucleic Acids Res..

[6]  Martin Kollmar,et al.  Nuclear codon reassignments in the genomics era and mechanisms behind their evolution. , 2017, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  Draft Genome Sequence and Annotation of the Lichen-Forming Fungus Arthonia radiata , 2018, Genome Announcements.

[8]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[9]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[10]  Han Fang,et al.  GenomeScope: Fast reference-free genome profiling from short reads , 2016, bioRxiv.

[11]  Martin Kollmar,et al.  A novel hybrid gene prediction method employing protein multiple sequence alignments , 2011, Bioinform..

[12]  Robert M. Waterhouse,et al.  Genomic Features of the Damselfly Calopteryx splendens Representing a Sister Clade to Most Insect Orders , 2017, Genome biology and evolution.

[13]  N. S. Araujo,et al.  RNA-Seq reveals that mitochondrial genes and long non-coding RNAs may play important roles in the bivoltine generations of the non-social Neotropical bee Tetrapedia diversipes , 2018, Apidologie.

[14]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[15]  Paul Medvedev,et al.  Informed and automated k-mer size selection for genome assembly , 2013, Bioinform..

[16]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[17]  Robert M. Waterhouse,et al.  Correlating Traits of Gene Retention, Sequence Divergence, Duplicability and Essentiality in Vertebrates, Arthropods, and Fungi , 2010, Genome biology and evolution.

[18]  Olivier Panaud,et al.  Oak genome reveals facets of long lifespan , 2018, Nature Plants.

[19]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[20]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[21]  M. Berriman,et al.  REAPR: a universal tool for genome assembly evaluation , 2013, Genome Biology.

[22]  Eric T. Domyan,et al.  Improved Genome Assembly and Annotation for the Rock Pigeon (Columba livia) , 2017, G3: Genes, Genomes, Genetics.

[23]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..