Optimization based automated curation of metabolic reconstructions

BackgroundCurrently, there exists tens of different microbial and eukaryotic metabolic reconstructions (e.g., Escherichia coli, Saccharomyces cerevisiae, Bacillus subtilis) with many more under development. All of these reconstructions are inherently incomplete with some functionalities missing due to the lack of experimental and/or homology information. A key challenge in the automated generation of genome-scale reconstructions is the elucidation of these gaps and the subsequent generation of hypotheses to bridge them.ResultsIn this work, an optimization based procedure is proposed to identify and eliminate network gaps in these reconstructions. First we identify the metabolites in the metabolic network reconstruction which cannot be produced under any uptake conditions and subsequently we identify the reactions from a customized multi-organism database that restores the connectivity of these metabolites to the parent network using four mechanisms. This connectivity restoration is hypothesized to take place through four mechanisms: a) reversing the directionality of one or more reactions in the existing model, b) adding reaction from another organism to provide functionality absent in the existing model, c) adding external transport mechanisms to allow for importation of metabolites in the existing model and d) restore flow by adding intracellular transport reactions in multi-compartment models. We demonstrate this procedure for the genome- scale reconstruction of Escherichia coli and also Saccharomyces cerevisiae wherein compartmentalization of intra-cellular reactions results in a more complex topology of the metabolic network. We determine that about 10% of metabolites in E. coli and 30% of metabolites in S. cerevisiae cannot carry any flux. Interestingly, the dominant flow restoration mechanism is directionality reversals of existing reactions in the respective models.ConclusionWe have proposed systematic methods to identify and fill gaps in genome-scale metabolic reconstructions. The identified gaps can be filled both by making modifications in the existing model and by adding missing reactions by reconciling multi-organism databases of reactions with existing genome-scale models. Computational results provide a list of hypotheses to be queried further and tested experimentally.

[1]  V. de Crécy-Lagard,et al.  Identification of the tRNA-Dihydrouridine Synthase Family* , 2002, The Journal of Biological Chemistry.

[2]  Matthias Heinemann,et al.  Systematic assignment of thermodynamic constraints in metabolic network models , 2006, BMC Bioinformatics.

[3]  Bas Teusink,et al.  Accelerating the reconstruction of genome-scale metabolic networks , 2006, BMC Bioinformatics.

[4]  Peter D. Karp,et al.  EcoCyc: a comprehensive database resource for Escherichia coli , 2004, Nucleic Acids Res..

[5]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[6]  G. Nemhauser,et al.  Integer Programming , 2020 .

[7]  S. Clarke,et al.  A Novel Methyltransferase Catalyzes the Methyl Esterification of trans-Aconitate in Escherichia coli * , 1999, The Journal of Biological Chemistry.

[8]  H. Cai,et al.  Distinct reactions catalyzed by bacterial and yeast trans-aconitate methyltransferases. , 2001, Biochemistry.

[9]  George M. Church,et al.  Filling gaps in a metabolic network using expression information , 2004, ISMB/ECCB.

[10]  Guy Plunkett,et al.  Genome Sequence of Yersinia pestis KIM , 2002, Journal of bacteriology.

[11]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[12]  Peter D. Karp,et al.  A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases , 2004, BMC Bioinformatics.

[13]  T. Bobik,et al.  Identification of the Human Methylmalonyl-CoA Racemase Gene Based on the Analysis of Prokaryotic Gene Arrangements , 2001, The Journal of Biological Chemistry.

[14]  Justin A. Ionita,et al.  Metabolic networks: enzyme function and metabolite structure. , 2004, Current opinion in structural biology.

[15]  M. Pellegrini,et al.  Computational method to assign microbial genes to pathways , 2001, Journal of cellular biochemistry. Supplement.

[16]  S M Payne,et al.  Complete Genome Sequence and Comparative Genomics of Shigella flexneri Serotype 2a Strain 2457T , 2003, Infection and Immunity.

[17]  Matthew D. Jankowski,et al.  Genome-scale thermodynamic analysis of Escherichia coli metabolism. , 2006, Biophysical journal.

[18]  B. Palsson,et al.  An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR) , 2003, Genome Biology.

[19]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.

[20]  Jim Hu,et al.  A dictionary of genetics , 2000, In Vitro Cellular & Developmental Biology - Animal.

[21]  Yoav Freund,et al.  Identifying metabolic enzymes with multiple types of association evidence , 2006, BMC Bioinformatics.

[22]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes , 2005, Nucleic Acids Res..

[23]  Peter D. Karp,et al.  The Pathway Tools software , 2002, ISMB.

[24]  Markus J. Herrgård,et al.  Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. , 2004, Genome research.

[25]  D. Vitkup,et al.  Predicting genes for orphan metabolic activities using phylogenetic profiles , 2006, Genome Biology.

[26]  B. Palsson,et al.  Systems approach to refining genome annotation , 2006, Proceedings of the National Academy of Sciences.

[27]  B. Palsson,et al.  In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data , 2001, Nature Biotechnology.

[28]  R. Overbeek,et al.  Missing genes in metabolic pathways: a comparative genomics approach. , 2003, Current opinion in chemical biology.