Structural and Functional Annotation of Eukaryotic Genomes with GenSAS.

The Genome Sequence Annotation Server (GenSAS, https://www.gensas.org ) is a secure, web-based genome annotation platform for structural and functional annotation, as well as manual curation. Requiring no installation by users, GenSAS integrates popular command line-based, annotation tools under a single, easy-to-use, online interface. GenSAS integrates JBrowse and Apollo, so users can view annotation data and manually curate gene models. Users are guided step by step through the annotation process by embedded instructions and a more in-depth GenSAS User's Guide. In addition to a genome assembly file, users can also upload organism-specific transcript, protein, and RNA-seq read evidence for use in the annotation process. The latest versions of the NCBI RefSeq transcript and protein databases and the SwissProt and TrEMBL protein databases are provided for all users. GenSAS projects can be shared with other GenSAS users enabling collaborative annotation. Once annotation is complete, GenSAS generates the final files of the annotated gene models in common file formats for use with other annotation tools, submission to a repository, and use in publications.

[1]  M. Borodovsky,et al.  Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. , 2008, Genome research.

[2]  Robert A. Edwards,et al.  Quality control and preprocessing of metagenomic datasets , 2011, Bioinform..

[3]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[4]  Robert M. Waterhouse,et al.  BUSCO Applications from Quality Assessments to Gene Prediction and Phylogenomics , 2017, bioRxiv.

[5]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[6]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[7]  Katharina J. Hoff,et al.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS , 2015, Bioinform..

[8]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[9]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[10]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[11]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[12]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[13]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[14]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[15]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[16]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints , 2005, Nucleic Acids Res..

[17]  S. Brunak,et al.  Locating proteins in the cell using TargetP, SignalP and related tools , 2007, Nature Protocols.

[18]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[19]  S. Salzberg,et al.  Interpolated Markov models for eukaryotic gene finding. , 1999, Genomics.

[20]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[21]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[22]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[23]  Steven L Salzberg,et al.  HISAT: a fast spliced aligner with low memory requirements , 2015, Nature Methods.

[24]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[25]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[26]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene finding in eukaryotes , 2004, Nucleic Acids Res..

[27]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[28]  E. Lerat,et al.  “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files , 2014, Mobile DNA.

[29]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[30]  Patricia P. Chan,et al.  tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes , 2016, Nucleic Acids Res..

[31]  O. Kohany,et al.  Repbase Update, a database of repetitive elements in eukaryotic genomes , 2015, Mobile DNA.

[32]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[33]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[34]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.