Estimating Cellular Goals from High-Dimensional Biological Data

Optimization-based models have been used to predict cellular behavior for over 25 years. The constraints in these models are derived from genome annotations, measured macromolecular composition of cells, and by measuring the cell's growth rate and metabolism in different conditions. The cellular goal (the optimization problem that the cell is trying to solve) can be challenging to derive experimentally for many organisms, including human or mammalian cells, which have complex metabolic capabilities and are not well understood. Existing approaches to learning goals from data include (a) estimating a linear objective function, or (b) estimating linear constraints that model complex biochemical reactions and constrain the cell's operation. The latter approach is important because often the known reactions are not enough to explain observations; therefore, there is a need to extend automatically the model complexity by learning new reactions. However, this leads to nonconvex optimization problems, and existing tools cannot scale to realistically large metabolic models. Hence, constraint estimation is still used sparingly despite its benefits for modeling cell metabolism, which is important for developing novel antimicrobials against pathogens, discovering cancer drug targets, and producing value-added chemicals. Here, we develop the first approach to estimating constraint reactions from data that can scale to realistically large metabolic models. Previous tools were used on problems having less than 75 reactions and 60 metabolites, which limits real-life-size applications. We perform extensive experiments using 75 large-scale metabolic network models for different organisms (including bacteria, yeasts, and mammals) and show that our algorithm can recover cellular constraint reactions. The recovered constraints enable accurate prediction of metabolic states in hundreds of growth environments not seen in training data, and we recover useful cellular goals even when some measurements are missing.

[1]  Daniel C. Zielinski,et al.  Recon3D enables a three-dimensional view of gene variation in human metabolism , 2018 .

[2]  Bill Freeman,et al.  Shape and Illumination from Shading using the Generic Viewpoint Assumption , 2014, NIPS.

[3]  Erwin P. Gianchandani,et al.  Predicting biological system objectives de novo from internal state measurements , 2008, BMC Bioinformatics.

[4]  Uri Alon,et al.  Inferring biological tasks using Pareto analysis of high-dimensional data , 2015, Nature Methods.

[5]  E. Ruppin,et al.  Computational reconstruction of tissue-specific metabolic models: application to human liver metabolism , 2010, Molecular systems biology.

[6]  Philip Miller,et al.  BiGG Models: A platform for integrating, standardizing and sharing genome-scale models , 2015, Nucleic Acids Res..

[7]  Bernhard O. Palsson,et al.  Optimizing genome-scale network reconstructions , 2014, Nature Biotechnology.

[8]  Mingrui Liu,et al.  ADMM without a Fixed Penalty Parameter: Faster Convergence with New Adaptive Penalization , 2017, NIPS.

[9]  Qi Zhao,et al.  Learning cellular objectives from fluxes by inverse optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[10]  Edward J. O'Brien,et al.  COBRAme: A computational framework for genome-scale models of metabolism and gene expression , 2017, bioRxiv.

[11]  Stephen P. Boyd,et al.  OSQP: an operator splitting solver for quadratic programs , 2017, 2018 UKACC 12th International Conference on Control (CONTROL).

[12]  Zheng Xu,et al.  Adaptive ADMM with Spectral Penalty Parameter Selection , 2016, AISTATS.

[13]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[14]  M. Brynildsen,et al.  Potentiating antibacterial activity by predictably enhancing endogenous microbial ROS production , 2012, Nature Biotechnology.

[15]  A. Burgard,et al.  Optimization-based framework for inferring and testing hypothesized metabolic objective functions. , 2003, Biotechnology and bioengineering.

[16]  Nate Derbinsky,et al.  Methods for Integrating Knowledge with the Three-Weight Optimization Algorithm for Hybrid Cognitive Processing , 2013, AAAI Fall Symposia.

[17]  Adam M. Feist,et al.  What do cells actually want? , 2016, Genome Biology.

[18]  Guilherme França,et al.  An explicit rate bound for over-relaxed ADMM , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[19]  Nikos D. Sidiropoulos,et al.  Consensus-ADMM for General Quadratically Constrained Quadratic Programming , 2016, IEEE Transactions on Signal Processing.

[20]  Brendt Wohlberg,et al.  ADMM Penalty Parameter Selection by Residual Balancing , 2017, ArXiv.

[21]  Nate Derbinsky,et al.  An Improved Three-Weight Message-Passing Algorithm , 2013, ArXiv.

[22]  U. Sauer,et al.  Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli , 2007, Molecular systems biology.

[23]  Gabriela Kalna,et al.  Haem oxygenase is synthetically lethal with the tumour suppressor fumarate hydratase , 2011, Nature.

[24]  Michael I. Jordan,et al.  A General Analysis of the Convergence of ADMM , 2015, ICML.

[25]  Mohammad Rostami,et al.  Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[26]  Iain S. Duff,et al.  MA57---a code for the solution of sparse symmetric definite and indefinite systems , 2004, TOMS.

[27]  A. Burgard,et al.  Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. , 2011, Nature chemical biology.

[28]  S. Lee,et al.  Integrative genome-scale metabolic analysis of Vibrio vulnificus for drug targeting and discovery , 2011, Molecular systems biology.

[29]  Javier Alonso-Mora,et al.  A message-passing algorithm for multi-agent trajectory planning , 2013, NIPS.

[30]  M. A. de Menezes,et al.  Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity , 2007, Proceedings of the National Academy of Sciences.

[31]  Morris,et al.  NSWC Library of Mathematics Subroutines , 1990 .

[32]  Daniel C. Zielinski,et al.  A Consensus Genome-scale Reconstruction of Chinese Hamster Ovary Cell Metabolism. , 2016, Cell systems.

[33]  Nicholas I. M. Gould,et al.  A numerical evaluation of sparse direct solvers for the solution of large sparse symmetric linear systems of equations , 2007, TOMS.

[34]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[35]  M. Domach,et al.  Simple constrained‐optimization view of acetate overflow in E. coli , 1990, Biotechnology and bioengineering.

[36]  Isaac Shamie,et al.  The emerging role of systems biology for engineering protein production in CHO cells. , 2018, Current opinion in biotechnology.

[37]  Zachary A. King,et al.  Constraint-based models predict metabolic and associated cellular functions , 2014, Nature Reviews Genetics.

[38]  Ioannis Ch. Paschalidis,et al.  Mapping the landscape of metabolic goals of a cell , 2016, Genome Biology.

[39]  B. He,et al.  Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities , 2000 .

[40]  Hamid Javadi,et al.  Preconditioning via Diagonal Scaling , 2016, 1610.03871.

[41]  M. Antoniewicz Methods and advances in metabolic flux analysis: a mini-review , 2015, Journal of Industrial Microbiology & Biotechnology.

[42]  Jonathan F. Bard,et al.  Practical Bilevel Optimization: Algorithms and Applications , 1998 .

[43]  Tom M. Conrad,et al.  Omic data from evolved E. coli are consistent with computed optimal growth from genome-scale models , 2010, Molecular systems biology.

[44]  Adam M. Feist,et al.  iML1515, a knowledgebase that computes Escherichia coli traits , 2017, Nature Biotechnology.

[45]  Alexander A. Alemi,et al.  SPARTA : Fast global planning of collision-avoiding robot trajectories , 2015 .

[46]  R. Aebersold,et al.  The quantitative and condition-dependent Escherichia coli proteome , 2015, Nature Biotechnology.

[47]  D. Ruiz A Scaling Algorithm to Equilibrate Both Rows and Columns Norms in Matrices 1 , 2001 .

[48]  R. Weisberg A-N-D , 2011 .