Predicting Genes in Single Genomes with AUGUSTUS

AUGUSTUS is a tool for finding protein‐coding genes and their exon‐intron structure in genomic sequences. It does not necessarily require additional experimental input, as it can be applied in so‐called ab initio mode. However, extrinsic evidence from various sources such as transcriptome sequencing or the annotations of closely related genomes can be integrated in order to improve the accuracy and completeness of the annotation. AUGUSTUS can be applied to single genomes, or simultaneously to several aligned genomes. Here, we describe steps required for training AUGUSTUS for the annotation of individual genomes and the steps to do the actual structural annotation. Further, we describe the generation and integration of evidence from various sources of extrinsic evidence. © 2018 by John Wiley & Sons, Inc.

[1]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[2]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[3]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[4]  Matthew Berriman,et al.  Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data , 2011, Bioinform..

[5]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[6]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[7]  M. Borodovsky,et al.  Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm , 2014, Nucleic acids research.

[8]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene finding in eukaryotes , 2004, Nucleic Acids Res..

[9]  William Stafford Noble A Quick Guide to Organizing Computational Biology Projects , 2009, PLoS Comput. Biol..

[10]  W. J. Kent,et al.  The UCSC Genome Browser , 2012, Current protocols in bioinformatics.

[11]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[12]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[13]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[14]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[15]  Samuel H. Payne,et al.  Discovery and revision of Arabidopsis genes by proteogenomics , 2008, Proceedings of the National Academy of Sciences.

[16]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[17]  Gordon Gremme,et al.  Computational Gene Structure Prediction , 2013 .

[18]  J. Harrow,et al.  Assessment of transcript reconstruction methods for RNA-seq , 2013, Nature Methods.

[19]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[20]  Mario Stanke,et al.  Simultaneous gene finding in multiple genomes , 2016, Bioinform..

[21]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[22]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[23]  Florian Odronitz,et al.  Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species , 2008, BMC Bioinformatics.

[24]  Katharina J. Hoff,et al.  WebAUGUSTUS—a web service for training AUGUSTUS and predicting genes in eukaryotes , 2013, Nucleic Acids Res..

[25]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[26]  Walter Pirovano,et al.  NCBI-compliant genome submissions: tips and tricks to save time and money , 2015, Briefings Bioinform..

[27]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..