MOSGA: Modular Open-Source Genome Annotator

MOTIVATION The generation of high-quality assemblies, even for large eukaryotic genomes, has become a routine task for many biologists thanks to recent advances in sequencing technologies. However, the annotation of these assemblies-a crucial step towards unlocking the biology of the organism of interest-has remained a complex challenge that often requires advanced bioinformatics expertise. RESULTS Here we present MOSGA, a genome annotation framework for eukaryotic genomes with a user-friendly web-interface that generates and integrates annotations from various tools. The aggregated results can be analyzed with a fully integrated genome browser and are provided in a format ready for submission to NCBI. MOSGA is built on a portable, customizable, and easily extendible Snakemake backend, and thus, can be tailored to a wide range of users and projects. AVAILABILITY We provide MOSGA as a web service at https://mosga.mathematik.uni-marburg.de and as a docker container at registry.gitlab.com/mosga/mosga: latest. Source code can be found at https://gitlab.com/mosga/mosga.

[1]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[2]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[3]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[4]  J. Archibald,et al.  More protist genomes needed , 2017, Nature Ecology &Evolution.

[5]  Markus List,et al.  KeyPathwayMinerWeb: online multi-omics network enrichment , 2016, Nucleic Acids Res..

[6]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[7]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[8]  Patricia P. Chan,et al.  tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes , 2016, Nucleic Acids Res..

[9]  T. Hackl,et al.  Four high-quality draft genome assemblies of the marine heterotrophic nanoflagellate Cafeteria roenbergensis , 2020, Scientific Data.

[10]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[11]  Sven Rahmann,et al.  Snakemake--a scalable bioinformatics workflow engine. , 2012, Bioinformatics.

[12]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[13]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[14]  J. Thompson,et al.  A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms , 2020, BMC Genomics.

[15]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[16]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[17]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints , 2005, Nucleic Acids Res..

[18]  Katharina J. Hoff,et al.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS , 2016, Bioinform..

[19]  Amos Bairoch,et al.  Swiss-Prot: Juggling between evolution and stability , 2004, Briefings Bioinform..

[20]  Jan Baumbach,et al.  De novo pathway-based biomarker identification , 2017, Nucleic acids research.

[21]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[22]  D. Dhotre,et al.  Comparative genomics of whole-cell pertussis vaccine strains from India , 2020, BMC Genomics.

[23]  Suzanna E Lewis,et al.  JBrowse: a dynamic web platform for genome visualization and analysis , 2016, Genome Biology.