Large scale methods to enumerate extreme rays and elementary modes

Approaching biological systems quantitatively using mathematical modeling techniques has gained increasing popularity in recent years. In systems biology, the biological complex is considered as a whole, as opposed to studying individual components and interactions. However, this can be very challenging even for simple bacteria, and various modeling techniques have been proposed approaching the enormous complexity at different levels. Simple methods at a high level of abstraction many times lack predictability and significance; detailed models capturing thermodynamic particulars are often limited due to gappy mechanistic knowledge and unknown kinetic parameters. Structural analysis as an intermediate attempt is based on usually wellknown network stoichiometries. Constraint-based methods like linear or convex optimization are able to predict reaction fluxes, growth rates or viability of knockout mutants with high confidence. However, their significance depends heavily on the underlying objectives, and alternative solutions in the flux space are mostly ignored. Comprehensive approaches exist, aiming at analyzing the flux space as a whole. The flux space is a high dimensional solution space for reaction fluxes, bounded by linear constraints on the flux values. Pathway analysis methods define minimal functional modes in the network that are able to comply with the constraints. All possible operation modes are superpositions of such elementary modes. Methods performing the analysis are transforming the descriptive constraints into generative basis vectors. Mathematically, the flux cone shapes a polyhedral cone, and algorithms arise from computational geometry, performing a representation conversion for the cone. Extreme ray enumeration, facet enumeration, vertex enumeration and convex hull are representatives of this family, and they are all related, if not equivalent. Unfortunately, the computation struggles with combinatorial explosion and is computationally intensive. At the beginning of this work, no implementation was applicable to genome scale metabolic networks. Improving the algorithms towards genome scale application is the declared goal of this thesis. The double description method is an algorithm to enumerate extreme rays of a polyhedral cone—or of elementary modes in biological terminology. It has proven efficient especially for degenerate problems, where points and constraints are not in general (e.g. random) position. Most biochemical networks lead to degenerate problems, hence the double description method is usually chosen for pathway computations. Our own implementation is also based on this method, and the present work describes the most important aspects to attain an efficient implementation. Various parts of the algorithm are performance critical and must be considered: it starts with input data, and we review and propose methods to sort and compress the data structures. We also show how to deal with different number types, since we need exact arithmetic for certain ill conditioned problem cases. A central part of the algorithm deals with elementarity of the

[1]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[2]  Jason A. Papin,et al.  Determination of redundancy and systems properties of the metabolic network of Helicobacter pylori using genome-scale extreme pathway analysis. , 2002, Genome research.

[3]  Bernhard O Palsson,et al.  The convex basis of the left null space of the stoichiometric matrix leads to the definition of metabolically meaningful pools. , 2003, Biophysical journal.

[4]  B. Palsson,et al.  The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[5]  W. Wiechert 13C metabolic flux analysis. , 2001, Metabolic engineering.

[6]  Karl Heinz Borgwardt,et al.  Average-Case Analysis of the Double Description Method and the Beneath-Beyond Algorithm , 2007, Discret. Comput. Geom..

[7]  M. Feinberg,et al.  Understanding bistability in complex enzyme-driven reaction networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[9]  H. Raiffa,et al.  3. The Double Description Method , 1953 .

[10]  B. Palsson,et al.  Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation , 2005, BMC Microbiology.

[11]  Erwin P. Gianchandani,et al.  Predicting biological system objectives de novo from internal state measurements , 2008, BMC Bioinformatics.

[12]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[13]  Komei Fukuda,et al.  Double Description Method Revisited , 1995, Combinatorics and Computer Science.

[14]  R. Mahadevan,et al.  The effects of alternate optimal solutions in constraint-based genome-scale metabolic models. , 2003, Metabolic engineering.

[15]  S. Klamt,et al.  Generalized concept of minimal cut sets in biochemical networks. , 2006, Bio Systems.

[16]  Radhakrishnan Mahadevan,et al.  Geobacter sulfurreducens strain engineered for increased rates of respiration. , 2008, Metabolic engineering.

[17]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[18]  S. Lee,et al.  Metabolic engineering of Escherichia coli for the production of l-valine based on transcriptome analysis and in silico gene knockout simulation , 2007, Proceedings of the National Academy of Sciences.

[19]  Jörg Raisch,et al.  Subnetwork analysis reveals dynamic features of complex (bio)chemical networks , 2007, Proceedings of the National Academy of Sciences.

[20]  A. Kümmel Integrating thermodynamics-based modeling and quantitative experimental data for studying microbial metabolism , 2008 .

[21]  U. Sauer,et al.  Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism , 2005, Nature Genetics.

[22]  B. Palsson,et al.  In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data , 2001, Nature Biotechnology.

[23]  Jörg Stelling,et al.  Accelerating the Computation of Elementary Modes Using Pattern Trees , 2006, WABI.

[24]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[25]  Nan Xiao,et al.  Integrating metabolic, transcriptional regulatory and signal transduction models in Escherichia coli , 2008, Bioinform..

[26]  Robert Urbanczik,et al.  The geometry of the flux cone of a metabolic network. , 2005, Biophysical journal.

[27]  C. Wagner Nullspace Approach to Determine the Elementary Modes of Chemical Reaction Systems , 2004 .

[28]  B. Palsson,et al.  Reconstructing metabolic flux vectors from extreme pathways: defining the alpha-spectrum. , 2003, Journal of theoretical biology.

[29]  Eckart Zitzler,et al.  Design of a biological half adder , 2007 .

[30]  S. Schuster,et al.  ON ELEMENTARY FLUX MODES IN BIOCHEMICAL REACTION SYSTEMS AT STEADY STATE , 1994 .

[31]  B. Palsson,et al.  Regulation of gene expression in flux balance models of metabolism. , 2001, Journal of theoretical biology.

[32]  Andreas Kremling,et al.  Analysis of global control of Escherichia coli carbohydrate uptake , 2007, BMC Systems Biology.

[33]  B. Palsson,et al.  Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type Escherichia coli W3110 , 1994, Applied and environmental microbiology.

[34]  Jamey D. Young,et al.  Integrating cybernetic modeling with pathway analysis provides a dynamic, systems‐level description of metabolic control , 2008, Biotechnology and bioengineering.

[35]  David A. Fell,et al.  Detection of stoichiometric inconsistencies in biomolecular models , 2008, Bioinform..

[36]  B. Palsson,et al.  Characterization of Metabolism in the Fe(III)-Reducing Organism Geobacter sulfurreducens by Constraint-Based Modeling , 2006, Applied and Environmental Microbiology.

[37]  B. Palsson,et al.  Transcriptional regulation in constraints-based metabolic models of Escherichia coli Covert , 2002 .

[38]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[39]  Jörg Stelling,et al.  Large-scale computation of elementary flux modes with bit pattern trees , 2008, Bioinform..

[40]  David Avis,et al.  A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra , 1991, SCG '91.

[41]  F. Doyle,et al.  Dynamic flux balance analysis of diauxic growth in Escherichia coli. , 2002, Biophysical journal.

[42]  U. Sauer,et al.  Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli , 2007, Molecular systems biology.

[43]  J. Stelling,et al.  Genome‐scale metabolic networks , 2009, Wiley interdisciplinary reviews. Systems biology and medicine.

[44]  David Avis,et al.  How good are convex hull algorithms? , 1995, SCG '95.

[45]  U. Sauer,et al.  A Novel Metabolic Cycle Catalyzes Glucose Oxidation and Anaplerosis in Hungry Escherichia coli* , 2003, Journal of Biological Chemistry.

[46]  B O Palsson,et al.  Metabolic modeling of microbial strains in silico. , 2001, Trends in biochemical sciences.

[47]  Dinanath Sulakhe,et al.  PUMA2—grid-based high-throughput analysis of genomes and metabolic pathways , 2005, Nucleic Acids Res..

[48]  Joachim Selbig,et al.  Metabolic networks are NP-hard to reconstruct. , 2008, Journal of theoretical biology.

[49]  B. Palsson,et al.  Thirteen Years of Building Constraint-Based In Silico Models of Escherichia coli , 2003, Journal of bacteriology.

[50]  Vinay Satish Kumar,et al.  Optimization based automated curation of metabolic reconstructions , 2007, BMC Bioinformatics.

[51]  B. Palsson,et al.  Formulating genome-scale kinetic models in the post-genome era , 2008, Molecular systems biology.

[52]  Bas Teusink,et al.  Co-Regulation of Metabolic Genes Is Better Explained by Flux Coupling Than by Network Distance , 2008, PLoS Comput. Biol..

[53]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[54]  Herbert M. Sauro,et al.  Conservation analysis of large biochemical networks , 2006, Bioinform..

[55]  Thomas Szyperski,et al.  Intracellular Carbon Fluxes in Riboflavin-Producing Bacillussubtilis during Growth on Two-Carbon Substrate Mixtures , 2002, Applied and Environmental Microbiology.

[56]  B. Palsson,et al.  Constraints-based models: regulation of gene expression reduces the steady-state solution space. , 2003, Journal of theoretical biology.

[57]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[58]  J. Keasling,et al.  Effect of Escherichia coli biomass composition on central metabolic fluxes predicted by a stoichiometric model. , 1998, Biotechnology and bioengineering.

[59]  Markus J. Herrgård,et al.  Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae. , 2006, Genome research.

[60]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[61]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[62]  E. Ruppin,et al.  Regulatory on/off minimization of metabolic flux changes after genetic perturbations. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Yoav Freund,et al.  Identifying metabolic enzymes with multiple types of association evidence , 2006, BMC Bioinformatics.

[64]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[65]  Steffen Klamt,et al.  Computation of elementary modes: a unifying framework and the new binary approach , 2004, BMC Bioinformatics.

[66]  J. Scott Provan Efficient enumeration of the vertices of polyhedra associated with network LP's , 1994, Math. Program..

[67]  J. Nielsen,et al.  Integration of gene expression data into genome-scale metabolic models. , 2004, Metabolic engineering.

[68]  S. Schuster,et al.  Metabolic network structure determines key aspects of functionality and regulation , 2002, Nature.

[69]  Vipul Periwal,et al.  Stoichiometric and Constraint-based Modeling , 2006 .

[70]  B. Palsson,et al.  Network analysis of intermediary metabolism using linear optimization. I. Development of mathematical formalism. , 1992, Journal of theoretical biology.

[71]  J. Liao,et al.  Ensemble modeling of metabolic networks. , 2008, Biophysical journal.

[72]  G. Church,et al.  Analysis of optimality in natural and perturbed metabolic networks , 2002 .

[73]  Jason A. Papin,et al.  The JAK-STAT signaling network in the human B-cell: an extreme signaling pathway analysis. , 2004, Biophysical journal.

[74]  Andreas Hoppe,et al.  Including metabolite concentrations into flux balance analysis: thermodynamic realizability as a constraint on flux distributions in metabolic networks , 2007, BMC Systems Biology.

[75]  Edda Klipp,et al.  Bringing metabolic networks to life: integration of kinetic, metabolic, and proteomic data , 2006, Theoretical Biology and Medical Modelling.

[76]  Masaru Tomita,et al.  Theoretical Biology and Medical Modelling , 2022 .

[77]  N. Chernikova Algorithm for finding a general formula for the non-negative solutions of a system of linear equations , 1964 .

[78]  A. Burgard,et al.  Optimization-based framework for inferring and testing hypothesized metabolic objective functions. , 2003, Biotechnology and bioengineering.

[79]  S Klamt,et al.  Algorithmic approaches for computing elementary modes in large biochemical reaction networks. , 2005, Systems biology.

[80]  B. Palsson,et al.  Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. , 2005, Genome research.

[81]  I. Grossmann,et al.  Recursive MILP model for finding all the alternate optima in LP models for metabolic networks , 2000 .

[82]  S. Panke,et al.  Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data , 2006, Molecular systems biology.

[83]  Gert Vriend,et al.  Correcting ligands, metabolites, and pathways , 2006, BMC Bioinformatics.

[84]  Vladimir Gurvich,et al.  Generating All Vertices of a Polyhedron Is Hard , 2006, SODA '06.

[85]  U. Sauer,et al.  Metabolic functions of duplicate genes in Saccharomyces cerevisiae. , 2005, Genome research.

[86]  G. T. Tsao,et al.  Cybernetic modeling of microbial growth on multiple substrates , 1984, Biotechnology and bioengineering.

[87]  E. Klipp,et al.  Bringing metabolic networks to life: convenience rate law and thermodynamic constraints , 2006, Theoretical Biology and Medical Modelling.

[88]  Stephen S Fong,et al.  Metabolic gene–deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes , 2004, Nature Genetics.

[89]  Costas D Maranas,et al.  OptStrain: a computational framework for redesign of microbial production systems. , 2004, Genome research.

[90]  J. Stelling,et al.  Combinatorial Complexity of Pathway Analysis in Metabolic Networks , 2004, Molecular Biology Reports.

[91]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes , 2005, Nucleic Acids Res..

[92]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[93]  P. McMullen The maximum numbers of faces of a convex polytope , 1970 .

[94]  Adam M. Feist,et al.  A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information , 2007, Molecular systems biology.

[95]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[96]  A. Burgard,et al.  Optknock: A bilevel programming framework for identifying gene knockout strategies for microbial strain optimization , 2003, Biotechnology and bioengineering.

[97]  Jens Nielsen,et al.  Evolutionary programming as a platform for in silico metabolic engineering , 2005, BMC Bioinformatics.

[98]  Ketan Mulmuley,et al.  Computational geometry - an introduction through randomized algorithms , 1993 .

[99]  V. Hatzimanikatis,et al.  Thermodynamics-based metabolic flux analysis. , 2007, Biophysical journal.

[100]  George M. Church,et al.  Filling gaps in a metabolic network using expression information , 2004, ISMB/ECCB.

[101]  Jason A. Papin,et al.  Genome-scale microbial in silico models: the constraints-based approach. , 2003, Trends in biotechnology.

[102]  John Gould,et al.  Toward the automated generation of genome-scale metabolic networks in the SEED , 2007, BMC Bioinformatics.

[103]  Erwin P. Gianchandani,et al.  Dynamic Analysis of Integrated Signaling, Metabolic, and Regulatory Networks , 2008, PLoS Comput. Biol..

[104]  D. Broomhead,et al.  Something from nothing − bridging the gap between constraint‐based and kinetic modelling , 2007, The FEBS journal.

[105]  Jörg Stelling,et al.  Parallel Extreme Ray and Pathway Computation , 2009, PPAM.

[106]  B. Palsson,et al.  Towards multidimensional genome annotation , 2006, Nature Reviews Genetics.

[107]  C. Wittmann,et al.  Metabolic flux analysis using mass spectrometry. , 2002, Advances in biochemical engineering/biotechnology.

[108]  C. Schilling,et al.  Flux coupling analysis of genome-scale metabolic network reconstructions. , 2004, Genome research.

[109]  Michel Deza,et al.  Combinatorics and Computer Science , 1996, Lecture Notes in Computer Science.

[110]  Antje Chang,et al.  BRENDA, enzyme data and metabolic information , 2002, Nucleic Acids Res..

[111]  C Phalakornkule,et al.  A MILP-based flux alternative generation and NMR experimental design strategy for metabolic engineering. , 2001, Metabolic engineering.

[112]  Daniel Boley,et al.  A Simple Rank Test to Distinguish Extreme Pathways from Elementary Modes in Metabolic Networks , 2008 .

[113]  Robert Urbanczik,et al.  Functional stoichiometric analysis of metabolic networks , 2005, Bioinform..

[114]  Masaru Tomita,et al.  GEM System: automatic prototyping of cell-wide metabolic pathway models from genomes , 2006, BMC Bioinformatics.

[115]  Adam M. Feist,et al.  Modeling methanogenesis with a genome‐scale metabolic reconstruction of Methanosarcina barkeri , 2006 .

[116]  Bernard Chazelle,et al.  An optimal convex hull algorithm in any fixed dimension , 1993, Discret. Comput. Geom..

[117]  Markus J. Herrgård,et al.  A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology , 2008, Nature Biotechnology.

[118]  H. Sauro,et al.  Conservation analysis in biochemical networks: computational issues for software writers. , 2004, Biophysical chemistry.

[119]  Steffen Klamt,et al.  Minimal cut sets in biochemical reaction networks , 2004, Bioinform..

[120]  Eytan Ruppin,et al.  Conservation of Expression and Sequence of Metabolic Genes Is Reflected by Activity Across Metabolic States , 2006, PLoS Comput. Biol..

[121]  Amy K. Schmid,et al.  A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell , 2007, Cell.

[122]  Adam M. Feist,et al.  The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli , 2008, Nature Biotechnology.

[123]  B. Palsson,et al.  Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth , 2002, Nature.

[124]  Peter Malkin,et al.  Computing Markov bases, Gröbner bases, and extreme rays , 2007 .

[125]  Derek R. Lovley,et al.  Cleaning up with genomics: applying molecular biology to bioremediation , 2003, Nature Reviews Microbiology.

[126]  B. Palsson,et al.  Expanded Metabolic Reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an In Silico Genome-Scale Characterization of Single- and Double-Deletion Mutants , 2005, Journal of bacteriology.

[127]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[128]  Markus J. Herrgård,et al.  Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. , 2004, Genome research.

[129]  Leen Stougie,et al.  Modes and cuts in metabolic networks: Complexity and algorithms , 2009, Biosyst..

[130]  Susumu Goto,et al.  The KEGG databases at GenomeNet , 2002, Nucleic Acids Res..