GROOLS: reactive graph reasoning for genome annotation through biological processes

BackgroundHigh quality functional annotation is essential for understanding the phenotypic consequences encoded in a genome. Despite improvements in bioinformatics methods, millions of sequences in databanks are not assigned reliable functions. The curation of protein functions in the context of biological processes is a way to evaluate and improve their annotation.ResultsWe developed an expert system using paraconsistent logic, named GROOLS (Genomic Rule Object-Oriented Logic System), that evaluates the completeness and the consistency of predicted functions through biological processes like metabolic pathways. Using a generic and hierarchical representation of knowledge, biological processes are modeled in a graph from which observations (i.e. predictions and expectations) are propagated by rules. At the end of the reasoning, conclusions are assigned to biological process components and highlight uncertainties and inconsistencies. Results on 14 microbial organisms are presented.ConclusionsGROOLS software is designed to evaluate the overall accuracy of functional unit and pathway predictions according to organism experimental data like growth phenotypes. It assists biocurators in the functional annotation of proteins by focusing on missing or contradictory observations.

[1]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[2]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[3]  Owen White,et al.  Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics , 2005, Bioinform..

[4]  Anne Morgat,et al.  UniPathway: a resource for the exploration and annotation of metabolic pathways , 2011, Nucleic Acids Res..

[5]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[6]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[7]  Alexandre Renaux,et al.  MicroScope in 2017: an expanding and evolving integrated resource for community expertise of microbial genomes , 2016, Nucleic Acids Res..

[8]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[9]  Heinrich Wansing,et al.  Some Useful 16-Valued Logics: How a Computer Network Should Think , 2005, J. Philos. Log..

[10]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[11]  V. Schachter,et al.  Genome-scale models of bacterial metabolism: reconstruction and applications , 2008, FEMS microbiology reviews.

[12]  Benjamin Hofner,et al.  opm: an R package for analysing OmniLog® phenotype microarray data , 2013, Bioinform..

[13]  Anne Morgat,et al.  Updates in Rhea – an expert curated resource of biochemical reactions , 2017, Nucleic Acids Res..

[14]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[15]  Stefan Engelen,et al.  MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data , 2012, Nucleic Acids Res..

[16]  Peter D. Karp,et al.  The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases , 2007, Nucleic Acids Res..

[17]  John F. Sowa,et al.  Conceptual graphs as a universal knowledge representation , 1992 .

[18]  Natalia N. Ivanova,et al.  Supporting community annotation and user collaboration in the integrated microbial genomes (IMG) system , 2016, BMC Genomics.

[19]  Elisabeth Coudert,et al.  HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot , 2008, Nucleic Acids Res..

[20]  Heinrich Wansing,et al.  Hyper-Contradictions, Generalized Truth Values and Logics of Truth and Falsehood , 2006, J. Log. Lang. Inf..

[21]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[22]  Natalia N. Ivanova,et al.  Improving Microbial Genome Annotations in an Integrated Database Context , 2013, PloS one.

[23]  The UniProt Consortium UniProt: the universal protein knowledgebase , 2016, Nucleic Acids Res..