Practical Guide for Fungal Gene Prediction from Genome Assembly and RNA-Seq Reads by FunGAP.

FunGAP is a Python-wrapped fungal genome annotation pipeline running under the Linux/Unix operating system. The annotation procedure used in FunGAP requires two inputs, genome assembly and RNA-seq reads. FunGAP aims to predict the most feasible gene from all plausible gene models obtained from various gene prediction programs using multiple strategies such as ab initio, EST-, and/or homology-based methods. This guide covers how to run the FunGAP from the command line and use various options for practical gene prediction. Users can choose options for quality control of the input sequences, selecting model database, filtration of predicted gene models, and post-process such as checking genome completeness and transposable elements. Using FunGAP, the user will acquire a high-quality fungal gene prediction for post-genome sequencing analysis.

[1]  Burkhard Morgenstern,et al.  AUGUSTUS: ab initio prediction of alternative transcripts , 2006, Nucleic Acids Res..

[2]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[3]  Inna Dubchak,et al.  MycoCosm portal: gearing up for 1000 fungal genomes , 2013, Nucleic Acids Res..

[4]  B. Simmons,et al.  Draft Genome Sequence of Neurospora crassa Strain FGSC 73 , 2015, Genome Announcements.

[5]  Bernhard Y Renard,et al.  IPred - integrating ab initio and evidence based gene predictions to improve prediction accuracy , 2015, BMC Genomics.

[6]  Gregory Butler,et al.  SnowyOwl: accurate prediction of fungal genes by using RNA-Seq and homology information to select among ab initio models , 2014, BMC Bioinformatics.

[7]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[8]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[9]  B. Henrissat,et al.  Ectomycorrhizal ecology is imprinted in the genome of the dominant symbiotic fungus Cenococcum geophilum , 2016, Nature Communications.

[10]  Evgeny M. Zdobnov,et al.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs , 2015, Bioinform..

[11]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[12]  Katharina J. Hoff,et al.  BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS , 2016, Bioinform..

[13]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[14]  Igor V. Grigoriev,et al.  FunGAP: Fungal Genome Annotation Pipeline using evidence‐based gene model evaluation , 2017, Bioinform..

[15]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[16]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[17]  Mark Borodovsky,et al.  Eukaryotic Gene Prediction Using GeneMark.hmm‐E and GeneMark‐ES , 2011, Current protocols in bioinformatics.