Semisupervised Gaussian Process for Automated Enzyme Search.

Synthetic biology is today harnessing the design of novel and greener biosynthesis routes for the production of added-value chemicals and natural products. The design of novel pathways often requires a detailed selection of enzyme sequences to import into the chassis at each of the reaction steps. To address such design requirements in an automated way, we present here a tool for exploring the space of enzymatic reactions. Given a reaction and an enzyme the tool provides a probability estimate that the enzyme catalyzes the reaction. Our tool first considers the similarity of a reaction to known biochemical reactions with respect to signatures around their reaction centers. Signatures are defined based on chemical transformation rules by using extended connectivity fingerprint descriptors. A semisupervised Gaussian process model associated with the similar known reactions then provides the probability estimate. The Gaussian process model uses information about both the reaction and the enzyme in providing the estimate. These estimates were validated experimentally by the application of the Gaussian process model to a newly identified metabolite in Escherichia coli in order to search for the enzymes catalyzing its associated reactions. Furthermore, we show with several pathway design examples how such ability to assign probability estimates to enzymatic reactions provides the potential to assist in bioengineering applications, providing experimental validation to our proposed approach. To the best of our knowledge, the proposed approach is the first application of Gaussian processes dealing with biological sequences and chemicals, the use of a semisupervised Gaussian process framework is also novel in the context of machine learning applied to bioinformatics. However, the ability of an enzyme to catalyze a reaction depends on the affinity between the substrates of the reaction and the enzyme. This affinity is generally quantified by the Michaelis constant KM. Therefore, we also demonstrate using Gaussian process regression to predict KM given a substrate-enzyme pair.

[1]  Adam M. Feist,et al.  Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-Path. , 2014, Metabolic engineering.

[2]  A. Burgard,et al.  Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. , 2011, Nature chemical biology.

[3]  V. Hatzimanikatis,et al.  Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3‐hydroxypropanoate , 2010, Biotechnology and bioengineering.

[4]  Lars Carlsson,et al.  Stereo Signature Molecular Descriptor , 2013, J. Chem. Inf. Model..

[5]  Meredith A. Williamson U.S. Biobased Products Market Potential and Projections Through 2025 , 2009 .

[6]  Edward Baidoo,et al.  A kinetic‐based approach to understanding heterologous mevalonate pathway function in E. coli , 2015, Biotechnology and bioengineering.

[7]  J. Weissenbach,et al.  Revealing the hidden functional diversity of an enzyme family. , 2014, Nature chemical biology.

[8]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[9]  Marjan De Mey,et al.  Multivariate modular metabolic engineering for pathway and strain optimization. , 2014, Current opinion in biotechnology.

[10]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[11]  Richard A. Notebaart,et al.  Network-level architecture and the evolutionary potential of underground metabolism , 2014, Proceedings of the National Academy of Sciences.

[12]  Roger L. Chang,et al.  Network Context and Selection in the Evolution to Enzyme Specificity , 2012, Science.

[13]  Jean-Loup Faulon,et al.  Genome scale enzyme–metabolite and drug–target interaction predictions using the signature molecular descriptor , 2008 .

[14]  Pablo Carbonell,et al.  Validation of RetroPath, a computer-aided design tool for metabolic pathway engineering. , 2014, Biotechnology journal.

[15]  Dan S. Tawfik,et al.  The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters. , 2011, Biochemistry.

[16]  Antje Chang,et al.  BRENDA, enzyme data and metabolic information , 2002, Nucleic Acids Res..

[17]  Pablo Carbonell,et al.  Enumerating metabolic pathways for the production of heterologous target chemicals in chassis organisms , 2012, BMC Systems Biology.

[18]  B. Hall,et al.  Activation of the bgl operon by adaptive mutation. , 1998, Molecular biology and evolution.

[19]  Franck Chauvat,et al.  High performance analysis of the cyanobacterial metabolism via liquid chromatography coupled to a LTQ-Orbitrap mass spectrometer: evidence that glucose reprograms the whole carbon metabolism and triggers oxidative stress , 2011, Metabolomics.

[20]  Konrad Büssow,et al.  Vectors for co-expression of an unrestricted number of proteins , 2007, Nucleic acids research.

[21]  Pablo Carbonell,et al.  A retrosynthetic biology approach to metabolic pathway design for therapeutic production , 2011, BMC Systems Biology.

[22]  J. Rabinowitz,et al.  Absolute Metabolite Concentrations and Implied Enzyme Active Site Occupancy in Escherichia coli , 2009, Nature chemical biology.

[23]  Andreas Bender,et al.  Computational Prediction of Metabolism: Sites, Products, SAR, P450 Enzyme Dynamics, and Mechanisms , 2012, J. Chem. Inf. Model..

[24]  Pablo Carbonell,et al.  Retropath: automated pipeline for embedded metabolic circuits. , 2014, ACS synthetic biology.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Michael Strupp,et al.  Effects of acetyl-dl-leucine in patients with cerebellar ataxia: a case series , 2013, Journal of Neurology.

[27]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..