Machine learning based analyses on metabolic networks supports high-throughput knockout screens

BackgroundComputational identification of new drug targets is a major goal of pharmaceutical bioinformatics.ResultsThis paper presents a machine learning strategy to study and validate essential enzymes of a metabolic network. Each single enzyme was characterized by its local network topology, gene homologies and co-expression, and flux balance analyses. A machine learning system was trained to distinguish between essential and non-essential reactions. It was validated by a comprehensive experimental dataset, which consists of the phenotypic outcomes from single knockout mutants of Escherichia coli (KEIO collection). We yielded very reliable results with high accuracy (93%) and precision (90%). We show that topologic, genomic and transcriptomic features describing the network are sufficient for defining the essentiality of a reaction. These features do not substantially depend on specific media conditions and enabled us to apply our approach also for less specific media conditions, like the lysogeny broth rich medium.ConclusionOur analysis is feasible to validate experimental knockout data of high throughput screens, can be used to improve flux balance analyses and supports experimental knockout screens to define drug targets.

[1]  Adam M. Feist,et al.  A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information , 2007, Molecular systems biology.

[2]  M. Kanehisa,et al.  Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. , 2003, Journal of the American Chemical Society.

[3]  B. Palsson,et al.  Expanded Metabolic Reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an In Silico Genome-Scale Characterization of Single- and Double-Deletion Mutants , 2005, Journal of bacteriology.

[4]  Ronan M. T. Fleming,et al.  Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 , 2007, Nature Protocols.

[5]  Bernhard O. Palsson,et al.  Investigating the metabolic capabilities of Mycobacterium tuberculosis H37Rv using the in silico strain iNJ661 and proposing alternative drug targets , 2007 .

[6]  A. Hopkins,et al.  The druggable genome , 2002, Nature Reviews Drug Discovery.

[7]  Gerhard Reinelt,et al.  Discovering functional gene expression patterns in the metabolic network of Escherichia coli with wavelets transforms , 2006, BMC Bioinformatics.

[8]  U. Sauer,et al.  Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli , 2007, Molecular systems biology.

[9]  B. Palsson,et al.  Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data* , 2007, Journal of Biological Chemistry.

[10]  Adam M. Feist,et al.  Modeling methanogenesis with a genome‐scale metabolic reconstruction of Methanosarcina barkeri , 2006 .

[11]  R. Altman,et al.  Computational analysis of Plasmodium falciparum metabolism: organizing genomic information to facilitate drug discovery. , 2004, Genome research.

[12]  D. Fell,et al.  A general definition of metabolic pathways useful for systematic organization and analysis of complex metabolic networks , 2000, Nature Biotechnology.

[13]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[15]  Bernhard O. Palsson,et al.  Metabolic flux balance analysis and the in silico analysis of Escherichia coli K-12 gene deletions , 2000, BMC Bioinformatics.

[16]  Jason A. Papin,et al.  * Corresponding authors , 2006 .

[17]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[18]  Andrew R. Joyce,et al.  Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli , 2006, Journal of bacteriology.

[19]  Roland Eils,et al.  Using gene expression data and network topology to detect substantial pathways, clusters and switches during oxygen deprivation of Escherichia coli , 2007, BMC Bioinformatics.

[20]  H. Mori,et al.  Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection , 2006, Molecular systems biology.

[21]  Dietmar Schomburg,et al.  Observing local and global properties of metabolic pathways: "load points" and "choke points" in the metabolic networks , 2006, Bioinform..

[22]  Markus J. Herrgård,et al.  Integrating high-throughput and computational data elucidates bacterial networks , 2004, Nature.

[23]  S. Dhanasekaran,et al.  Import of host δ-aminolevulinate dehydratase into the malarial parasite: Identification of a new drug target , 2000, Nature Medicine.

[24]  Ney Lemke,et al.  Essentiality and damage in metabolic networks , 2004, Bioinform..

[25]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[26]  Markus J. Herrgård,et al.  Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. , 2004, Genome research.

[27]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[28]  Sanjay Jain,et al.  Low degree metabolites explain essential reactions and enhance modularity in biological networks , 2005, BMC Bioinformatics.