Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of "-Omics" Data and Systems Biology

The context for bioinformatics continues to change as new technology brings more varied data in greater volume. We present the preliminary design of a pipeline for functional annotation of fungal genomes. Genome-wide functional annotation benefits from the variety and volume of data available from “-omics” technology, and benefits from the perspective of systems biology.

[1]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[2]  J. J. Díaz-Mejía,et al.  Network-based function prediction and interactomics: the case for metabolic enzymes. , 2011, Metabolic engineering.

[3]  Intawat Nookaew,et al.  BioMet Toolbox: genome-wide analysis of metabolism , 2010, Nucleic Acids Res..

[4]  R. Overbeek,et al.  Curation is forever: comparative genomics approaches to functional annotation , 2003 .

[5]  Thomas Rattei,et al.  SIMAP: the similarity matrix of proteins , 2006, Nucleic Acids Res..

[6]  C. Soderlund,et al.  SyMAP: A system for discovering and viewing syntenic regions of FPC maps. , 2006, Genome research.

[7]  Ramana Madupu,et al.  CharProtDB: a database of experimentally characterized protein annotations , 2011, Nucleic Acids Res..

[8]  Georg Schneider,et al.  ANNIE: integrated de novo protein sequence annotation , 2009, Nucleic Acids Res..

[9]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[10]  Mikhail S. Gelfand,et al.  Mining sequence annotation databanks for association patterns , 2005, Bioinform..

[11]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[12]  Daisuke Kihara,et al.  ESG: extended similarity group method for automated protein function prediction , 2008, Bioinform..

[13]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[14]  Dmitrij Frishman,et al.  Protein annotation at genomic scale: the current status. , 2007, Chemical reviews.

[15]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[16]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[17]  Bas Teusink,et al.  Accelerating the reconstruction of genome-scale metabolic networks , 2006, BMC Bioinformatics.

[18]  Justin Powlowski,et al.  Curation of characterized glycoside hydrolases of Fungal origin , 2011, Database J. Biol. Databases Curation.

[19]  Andreas Martin Lisewski,et al.  Protein function prediction: towards integration of similarity metrics. , 2011, Current opinion in structural biology.

[20]  Samuel V. Angiuoli,et al.  The IGS Standard Operating Procedure for Automated Prokaryotic Annotation , 2011, Standards in genomic sciences.

[21]  Peter D. Karp,et al.  Discovering novel subsystems using comparative genomics , 2011, Bioinform..

[22]  Duncan P. Brown,et al.  Automated Protein Subfamily Identification and Classification , 2007, PLoS Comput. Biol..

[23]  Daisuke Kihara,et al.  Function Prediction of uncharacterized proteins , 2007, J. Bioinform. Comput. Biol..

[24]  S. Brunak,et al.  Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. , 2000, Journal of molecular biology.

[25]  B. Teusink,et al.  A practical guide to genome-scale metabolic models and their analysis. , 2011, Methods in enzymology.

[26]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[27]  Olivier Poch,et al.  PipeAlign: a new toolkit for protein family analysis , 2003, Nucleic Acids Res..

[28]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[29]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[30]  Jeffrey D. Orth,et al.  Systematizing the generation of missing metabolic knowledge , 2010, Biotechnology and bioengineering.

[31]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[32]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[33]  Bernhard Schölkopf,et al.  Fast protein classification with multiple networks , 2005, ECCB/JBI.

[34]  Christina A Cuomo,et al.  Approaches to Fungal Genome Annotation , 2011, Mycology.

[35]  Peter D. Karp,et al.  The Pathway Tools Pathway Prediction Algorithm , 2011, Standards in genomic sciences.

[36]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[37]  Erik L. L. Sonnhammer,et al.  InParanoid 7: new algorithms and tools for eukaryotic orthology analysis , 2009, Nucleic Acids Res..

[38]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[39]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[40]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[41]  Michael I. Jordan,et al.  Genome-scale phylogenetic function annotation of large and diverse protein families. , 2011, Genome research.

[42]  Dmitrij Frishman,et al.  MIPS: analysis and annotation of genome information in 2007 , 2007, Nucleic Acids Res..

[43]  Peter D. Karp,et al.  Machine learning methods for metabolic pathway prediction , 2010 .

[44]  Jean Armengaud,et al.  A perfect genome annotation is within reach with the proteomics and genomics alliance. , 2009, Current opinion in microbiology.

[45]  C. Claudel-Renard,et al.  Enzyme-specific profiles for genome annotation: PRIAM. , 2003, Nucleic acids research.

[46]  Gong-Xin Yu,et al.  Ruleminer: a Knowledge System for Supporting High-throughput Protein Function Annotations , 2004, J. Bioinform. Comput. Biol..

[47]  Arcady R. Mushegian,et al.  Computational methods for Gene Orthology inference , 2011, Briefings Bioinform..

[48]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[49]  U. Sauer,et al.  Global probabilistic annotation of metabolic networks enables enzyme discovery , 2012, Nature chemical biology.

[50]  A. Krogh,et al.  A combined transmembrane topology and signal peptide prediction method. , 2004, Journal of molecular biology.

[51]  Iddo Friedberg,et al.  Automated protein function predictionçthe genomic challenge , 2006 .

[52]  Rolf Apweiler,et al.  Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT , 2001, Bioinform..

[53]  S. Pongor,et al.  The quest for orthologs: finding the corresponding gene across genomes. , 2008, Trends in genetics : TIG.

[54]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[55]  Paul Horton,et al.  Nucleic Acids Research Advance Access published May 21, 2007 WoLF PSORT: protein localization predictor , 2007 .

[56]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[57]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[58]  Peter D. Karp,et al.  Using genome-context data to identify specific types of functional associations in pathway/genome databases , 2007, ISMB/ECCB.

[59]  Daisuke Kihara,et al.  Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP , 2010, BMC Bioinformatics.

[60]  Elisabeth Coudert,et al.  HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot , 2008, Nucleic Acids Res..

[61]  William H. Majoros,et al.  Methods for computational gene prediction , 2007 .

[62]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[63]  Peter D. Karp,et al.  A systematic study of genome context methods: calibration, normalization and combination , 2010, BMC Bioinformatics.

[64]  V. Bafna,et al.  Proteogenomics to discover the full coding content of genomes: a computational perspective. , 2010, Journal of proteomics.

[65]  Oliver Kohlbacher,et al.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction , 2009, BMC Bioinformatics.

[66]  J. Gogarten,et al.  Using comparative genome analysis to identify problems in annotated microbial genomes. , 2010, Microbiology.

[67]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..