Automated alignment-based curation of gene models in filamentous fungi

BackgroundAutomated gene-calling is still an error-prone process, particularly for the highly plastic genomes of fungal species. Improvement through quality control and manual curation of gene models is a time-consuming process that requires skilled biologists and is only marginally performed. The wealth of available fungal genomes has not yet been exploited by an automated method that applies quality control of gene models in order to obtain more accurate genome annotations.ResultsWe provide a novel method named alignment-based fungal gene prediction (ABFGP) that is particularly suitable for plastic genomes like those of fungi. It can assess gene models on a gene-by-gene basis making use of informant gene loci. Its performance was benchmarked on 6,965 gene models confirmed by full-length unigenes from ten different fungi. 79.4% of all gene models were correctly predicted by ABFGP. It improves the output of ab initio gene prediction software due to a higher sensitivity and precision for all gene model components. Applicability of the method was shown by revisiting the annotations of six different fungi, using gene loci from up to 29 fungal genomes as informants. Between 7,231 and 8,337 genes were assessed by ABFGP and for each genome between 1,724 and 3,505 gene model revisions were proposed. The reliability of the proposed gene models is assessed by an a posteriori introspection procedure of each intron and exon in the multiple gene model alignment. The total number and type of proposed gene model revisions in the six fungal genomes is correlated to the quality of the genome assembly, and to sequencing strategies used in the sequencing centre, highlighting different types of errors in different annotation pipelines. The ABFGP method is particularly successful in discovering sequence errors and/or disruptive mutations causing truncated and erroneous gene models.ConclusionsThe ABFGP method is an accurate and fully automated quality control method for fungal gene catalogues that can be easily implemented into existing annotation pipelines. With the exponential release of new genomes, the ABFGP method will help decreasing the number of gene models that require additional manual curation.

[1]  A. Salamov,et al.  The Genomes of the Fungal Plant Pathogens Cladosporium fulvum and Dothistroma septosporum Reveal Adaptation to Different Hosts and Lifestyles But Also Signatures of Common Ancestry , 2012, PLoS genetics.

[2]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[3]  R. Oliver Genomic tillage and the harvest of fungal phytopathogens. , 2012, The New phytologist.

[4]  Koby Crammer,et al.  Reranking candidate gene models with cross-species comparison for improved gene prediction , 2008, BMC Bioinformatics.

[5]  A. Bahkali,et al.  Pseudogenization in pathogenic fungi with different host plants and lifestyles might reflect their evolutionary past. , 2014, Molecular plant pathology.

[6]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[7]  B. Morgenstern,et al.  AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome , 2006, Genome Biology.

[8]  Christina A Cuomo,et al.  The fungal genome initiative and lessons learned from genome sequencing. , 2010, Methods in enzymology.

[9]  BMC Bioinformatics , 2005 .

[10]  Koby Crammer,et al.  Automated gene-model curation using global discriminative learning , 2012, Bioinform..

[11]  J. V. van Kan,et al.  Genome Update of Botrytis cinerea Strains B05.10 and T4 , 2012, Eukaryotic Cell.

[12]  Paramvir S. Dehal,et al.  Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis , 2011, PLoS genetics.

[13]  Qian Liu,et al.  Evigan: a hidden variable model for integrating gene evidence for eukaryotic gene prediction , 2008, Bioinform..

[14]  David W. Dyer,et al.  Introns and Splicing Elements of Five Diverse Fungi , 2004, Eukaryotic Cell.

[15]  M. Borodovsky,et al.  Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. , 2008, Genome research.

[16]  S. Raffaele,et al.  Genome evolution in filamentous plant pathogens: why bigger can be better , 2012, Nature Reviews Microbiology.

[17]  Bernard Henrissat,et al.  Genomic Analysis of the Necrotrophic Fungal Pathogens Sclerotinia sclerotiorum and Botrytis cinerea , 2011, PLoS genetics.

[18]  Ernesto Picardi,et al.  Computational methods for ab initio and comparative gene finding. , 2010, Methods in molecular biology.

[19]  Florian Odronitz,et al.  Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species , 2008, BMC Bioinformatics.

[20]  D. Tang,et al.  RNA-Seq analysis reveals new gene models and alternative splicing in the fungal pathogen Fusarium graminearum , 2013, BMC Genomics.

[21]  Gerhard Adam,et al.  FGDB: revisiting the genome annotation of the plant pathogen Fusarium graminearum , 2010, Nucleic Acids Res..

[22]  A. Salamov,et al.  Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes Fungi , 2012, PLoS pathogens.

[23]  Charles J. Vaske,et al.  Gene prediction and verification in a compact genome with numerous small introns. , 2004, Genome research.

[24]  B. Birren,et al.  Patterns of Intron Gain and Loss in Fungi , 2004, PLoS biology.

[25]  Inna Dubchak,et al.  The genome portal of the Department of Energy Joint Genome Institute: 2014 updates , 2013, Nucleic Acids Res..

[26]  Christina A. Cuomo,et al.  Comparative Genomics Yields Insights into Niche Adaptation of Plant Vascular Wilt Pathogens , 2011, PLoS pathogens.