Taxonomic weighting improves the accuracy of a gap-filling algorithm for metabolic models

MOTIVATION The increasing availability of annotated genome sequences enables construction of genome-scale metabolic networks, which are useful tools for studying organisms of interest. However, due to incomplete genome annotations, draft metabolic models contain gaps that must be filled in a time-consuming process before they are usable. Optimization-based algorithms that fill these gaps have been developed, however, gap-filling algorithms show significant error rates and often introduce incorrect reactions. RESULTS Here, we present a new gap-filling method that computes the costs of candidate gap-filling reactions from a universal reaction database (MetaCyc) based on taxonomic information. When gap-filling a metabolic model for an organism M (such as Escherichia coli), the cost for reaction R is based on the frequency with which R occurs in other organisms within the phylum of M (in this case, Proteobacteria). The assumption behind this method is that different taxonomic groups are biased toward using different metabolic reactions. Evaluation of the new gap-filler on randomly degraded variants of the EcoCyc metabolic model for Escherichia coli showed an increase in the average F1-score to 99.0 (when using the variable weights by frequency method at the phylum level), compared to 91.0 using the previous MetaFlux gap-filler and 80.3 using a basic gap-filler. Evaluation on two other microbial metabolic models showed similar improvements. AVAILABILITY AND IMPLEMENTATION The Pathway Tools software (including MetaFlux) is free for academic use and is available at http://pathwaytools.com. Additional code for reproducing the results presented here is available at www.ai.sri.com/pkarp/pubs/taxgap/supplementary.zip. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Jeffrey D. Orth,et al.  Systematizing the generation of missing metabolic knowledge , 2010, Biotechnology and bioengineering.

[2]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[3]  Markus Krummenacker,et al.  The MetaCyc database of metabolic pathways and enzymes , 2017, Nucleic acids research.

[4]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[5]  John Gould,et al.  Toward the automated generation of genome-scale metabolic networks in the SEED , 2007, BMC Bioinformatics.

[6]  Hao Wang,et al.  RAVEN 2.0: A versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor , 2018, bioRxiv.

[7]  Peter D. Karp,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[8]  Peter D. Karp,et al.  A genome-scale metabolic flux model of Escherichia coli K–12 derived from the EcoCyc database , 2014, BMC Systems Biology.

[9]  T. Shlomi,et al.  MIRAGE: a functional genomics-based approach for metabolic network model reconstruction and its application to cyanobacteria networks , 2012, Genome Biology.

[10]  Ming Chen,et al.  DEF: an automated dead‐end filling approach based on quasi‐endosymbiosis , 2016, Bioinformatics.

[11]  Rick L. Stevens,et al.  High-throughput generation, optimization and analysis of genome-scale metabolic models , 2010, Nature Biotechnology.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Bernhard O. Palsson,et al.  Identification of Genome-Scale Metabolic Network Models Using Experimentally Measured Flux Profiles , 2006, PLoS Comput. Biol..

[14]  Suzanne M. Paley,et al.  The EcoCyc Database , 2002, Nucleic Acids Res..

[15]  Jennifer L Reed,et al.  Advances in gap-filling genome-scale metabolic models and model-driven experiments lead to novel metabolic discoveries. , 2018, Current opinion in biotechnology.

[16]  Peter D. Karp,et al.  How accurate is automated gap filling of metabolic models? , 2018, BMC Systems Biology.

[17]  Jennifer L Reed,et al.  Software platforms to facilitate reconstructing genome-scale metabolic networks. , 2014, Environmental microbiology.

[18]  Suzanne M. Paley,et al.  The BioCyc collection of microbial genomes and metabolic pathways , 2019, Briefings Bioinform..

[19]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[20]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[21]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[22]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[23]  D. Machado,et al.  Fast automated reconstruction of genome-scale metabolic models for microbial species and communities , 2018, bioRxiv.

[24]  Vinay Satish Kumar,et al.  Optimization based automated curation of metabolic reconstructions , 2007, BMC Bioinformatics.

[25]  Adam M. Feist,et al.  Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli , 2013, Molecular systems biology.

[26]  Miguel Rocha,et al.  Methods for automated genome-scale metabolic model reconstruction. , 2018, Biochemical Society transactions.

[27]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[28]  Peter D. Karp,et al.  Construction and completion of flux balance models from pathway databases , 2012, Bioinform..

[29]  Peter D. Karp,et al.  Evaluation of reaction gap-filling accuracy by randomization , 2018, BMC Bioinformatics.