BOFdat: Generating biomass objective functions for genome-scale metabolic models from experimental data

Genome-scale metabolic models (GEMs) are mathematically structured knowledge bases of metabolism that provide phenotypic predictions from genomic information. GEM-guided predictions of growth phenotypes rely on the accurate definition of a biomass objective function (BOF) that is designed to include key cellular biomass components such as the major macromolecules (DNA, RNA, proteins), lipids, coenzymes, inorganic ions and species-specific components. Despite its importance, no standardized computational platform is currently available to generate species-specific biomass objective functions in a data-driven, unbiased fashion. To fill this gap in the metabolic modeling software ecosystem, we implemented BOFdat, a Python package for the definition of a Biomass Objective Function from experimental data. BOFdat has a modular implementation that divides the BOF definition process into three independent modules defined here as steps: 1) the coefficients for major macromolecules are calculated, 2) coenzymes and inorganic ions are identified and their stoichiometric coefficients estimated, 3) the remaining species-specific metabolic biomass precursors are algorithmically extracted in an unbiased way from experimental data. We used BOFdat to reconstruct the BOF of the Escherichia coli model iML1515, a gold standard in the field. The BOF generated by BOFdat resulted in the most concordant biomass composition, growth rate, and gene essentiality prediction accuracy when compared to other methods. Installation instructions for BOFdat are available in the documentation and the source code is available on GitHub (https://github.com/jclachance/BOFdat).

[1]  Olivier Martin,et al.  MetaNetX/MNXref – reconciliation of metabolites and biochemical reactions to bring together genome-scale metabolic networks , 2015, Nucleic Acids Res..

[2]  Markus J. Herrgård,et al.  A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology , 2008, Nature Biotechnology.

[3]  Bernhard O. Palsson,et al.  Optimizing genome-scale network reconstructions , 2014, Nature Biotechnology.

[4]  Bernhard O. Palsson,et al.  Genome-scale estimation of cellular objectives , 2018, ArXiv.

[5]  Kim Sneppen,et al.  Pathway identification by network pruning in the metabolic network of Escherichia coli , 2009, Bioinform..

[6]  Peter D. Karp,et al.  The EcoCyc database: reflecting new knowledge about Escherichia coli K-12 , 2016, Nucleic Acids Res..

[7]  B. Palsson,et al.  Metabolic capabilities of Escherichia coli: I. synthesis of biosynthetic precursors and cofactors. , 1993, Journal of theoretical biology.

[8]  Joshua A. Lerman,et al.  COBRApy: COnstraints-Based Reconstruction and Analysis for Python , 2013, BMC Systems Biology.

[9]  Chunhui Li,et al.  Exploring the diversity of complex metabolic networks , 2005, Bioinform..

[10]  Matthias Heinemann,et al.  Condition-Dependent Cell Volume and Concentration of Escherichia coli to Facilitate Data Conversion for Systems Biology Modeling , 2011, PloS one.

[11]  Adam M. Feist,et al.  iML1515, a knowledgebase that computes Escherichia coli traits , 2017, Nature Biotechnology.

[12]  Isabel Rocha,et al.  Integration of Biomass Formulations of Genome-Scale Metabolic Models with Experimental Data Reveals Universally Essential Cofactors in Prokaryotes , 2015, Metabolic engineering.

[13]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[14]  R. Aebersold,et al.  The quantitative and condition-dependent Escherichia coli proteome , 2015, Nature Biotechnology.

[15]  A. Burgard,et al.  Optimization-based framework for inferring and testing hypothesized metabolic objective functions. , 2003, Biotechnology and bioengineering.

[16]  Ross P. Carlson,et al.  Measuring Cellular Biomass Composition for Computational Biology Applications , 2018 .

[17]  Erwin P. Gianchandani,et al.  Predicting biological system objectives de novo from internal state measurements , 2008, BMC Bioinformatics.

[18]  Duygu Dikicioglu,et al.  Biomass composition: the “elephant in the room” of metabolic modelling , 2015, Metabolomics.

[19]  Edward J. O'Brien,et al.  Using Genome-scale Models to Predict Biological Capabilities , 2015, Cell.

[20]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[21]  U. Sauer,et al.  Large-scale 13C-flux analysis reveals distinct transcriptional control of respiratory and fermentative metabolism in Escherichia coli , 2011, Molecular systems biology.

[22]  J. Hamilton,et al.  The function of ubiquinone in Escherichia coli. , 1970, The Biochemical journal.

[23]  David Weiss,et al.  Faculty Opinions recommendation of A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria. , 2016 .

[24]  Adam M. Feist,et al.  The biomass objective function. , 2010, Current opinion in microbiology.

[25]  B. Palsson,et al.  A protocol for generating a high-quality genome-scale metabolic reconstruction , 2010 .

[26]  R. Overbeek,et al.  Automated genome annotation and metabolic model reconstruction in the SEED and Model SEED. , 2013, Methods in molecular biology.

[27]  Vinay Satish Kumar,et al.  Optimization based automated curation of metabolic reconstructions , 2007, BMC Bioinformatics.

[28]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[29]  Lin Wang,et al.  Standardizing biomass reactions and ensuring complete mass balance in genome‐scale metabolic models , 2017, Bioinform..

[30]  Adam P. Arkin,et al.  Mutant phenotypes for thousands of bacterial genes of unknown function , 2018, Nature.

[31]  Ioannis Ch. Paschalidis,et al.  Mapping the landscape of metabolic goals of a cell , 2016, Genome Biology.

[32]  John S. Hawkins,et al.  A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria , 2016, Cell.

[33]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[34]  T. Hwa,et al.  Interdependence of Cell Growth and Gene Expression: Origins and Consequences , 2010, Science.

[35]  Annik Nanchen,et al.  Large-scale 13C-flux analysis reveals distinct transcriptional control of respiratory and fermentative metabolism in Escherichia coli , 2011, Molecular systems biology.

[36]  Diana M. Downs,et al.  An Unexpected Route to an Essential Cofactor: Escherichia coli Relies on Threonine for Thiamine Biosynthesis , 2016, mBio.

[37]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[38]  Jeffrey D. Orth,et al.  Systematizing the generation of missing metabolic knowledge , 2010, Biotechnology and bioengineering.

[39]  Adam M. Feist,et al.  What do cells actually want? , 2016, Genome Biology.

[40]  Pedro A. Diaz-Gomez,et al.  Initial Population for Genetic Algorithms: A Metric Approach , 2007, GEM.

[41]  Vinay Satish Kumar,et al.  GrowMatch: An Automated Method for Reconciling In Silico/In Vivo Growth Predictions , 2009, PLoS Comput. Biol..

[42]  D Botstein,et al.  Genetic engineering in vivo using translocatable drug-resistance elements. New methods in bacterial genetics. , 1977, Journal of molecular biology.

[43]  J. Höltje,et al.  Growth of the Stress-Bearing and Shape-Maintaining Murein Sacculus of Escherichia coli , 1998, Microbiology and Molecular Biology Reviews.