BoostGAPFILL: improving the fidelity of metabolic network reconstructions through integrated constraint and pattern‐based methods

Motivation: Metabolic network reconstructions are often incomplete. Constraint‐based and pattern‐based methodologies have been used for automated gap filling of these networks, each with its own strengths and weaknesses. Moreover, since validation of hypotheses made by gap filling tools require experimentation, it is challenging to benchmark performance and make improvements other than that related to speed and scalability. Results: We present BoostGAPFILL, an open source tool that leverages both constraint‐based and machine learning methodologies for hypotheses generation in gap filling and metabolic model refinement. BoostGAPFILL uses metabolite patterns in the incomplete network captured using a matrix factorization formulation to constrain the set of reactions used to fill gaps in a metabolic network. We formulate a testing framework based on the available metabolic reconstructions and demonstrate the superiority of BoostGAPFILL to state‐of‐the‐art gap filling tools. We randomly delete a number of reactions from a metabolic network and rate the different algorithms on their ability to both predict the deleted reactions from a universal set and to fill gaps. For most metabolic network reconstructions tested, BoostGAPFILL shows above 60% precision and recall, which is more than twice that of other existing tools. Availability and Implementation: MATLAB open source implementation (https://github.com/Tolutola/BoostGAPFILL) Contacts: toyetunde@wustl.edu or muhan@wustl.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[2]  B. Palsson Systems Biology: Constraint-based Reconstruction and Analysis , 2015 .

[3]  Peter D. Karp,et al.  Construction and completion of flux balance models from pathway databases , 2012, Bioinform..

[4]  Yixin Chen,et al.  Recovering Metabolic Networks using A Novel Hyperlink Prediction Method , 2016, ArXiv.

[5]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[6]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[7]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[8]  Costas D. Maranas,et al.  Optimization Methods in Metabolic Networks: Maranas/Optimization Methods in Metabolic Networks , 2016 .

[9]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[10]  Capers Jones,et al.  Embedded Software: Facts, Figures, and Future , 2009, Computer.

[11]  Jeffrey D. Orth,et al.  Systematizing the generation of missing metabolic knowledge , 2010, Biotechnology and bioengineering.

[12]  Jörg Stelling,et al.  Predicting network functions with nested patterns , 2014, Nature Communications.

[13]  Ronan M. T. Fleming,et al.  fastGapFill: efficient gap filling in metabolic networks , 2014, Bioinform..

[14]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.