A Mixed Quantum Chemistry/Machine Learning Approach for the Fast and Accurate Prediction of Biochemical Redox Potentials and Its Large-Scale Application to 315 000 Redox Reactions

A quantitative understanding of the thermodynamics of biochemical reactions is essential for accurately modeling metabolism. The group contribution method (GCM) is one of the most widely used approaches to estimate standard Gibbs energies and redox potentials of reactions for which no experimental measurements exist. Previous work has shown that quantum chemical predictions of biochemical thermodynamics are a promising approach to overcome the limitations of GCM. However, the quantum chemistry approach is significantly more expensive. Here, we use a combination of quantum chemistry and machine learning to obtain a fast and accurate method for predicting the thermodynamics of biochemical redox reactions. We focus on predicting the redox potentials of carbonyl functional group reductions to alcohols and amines, two of the most ubiquitous carbon redox transformations in biology. Our method relies on semiempirical quantum chemistry calculations calibrated with Gaussian process (GP) regression against available experimental data and results in higher predictive power than the GCM at low computational cost. Direct calibration of GCM and fingerprint-based predictions (without quantum chemistry) with GP regression also results in significant improvements in prediction accuracy, demonstrating the versatility of the approach. We design and implement a network expansion algorithm that iteratively reduces and oxidizes a set of natural seed metabolites and demonstrate the high-throughput applicability of our method by predicting the standard potentials of more than 315 000 redox reactions involving approximately 70 000 compounds. Additionally, we developed a novel fingerprint-based framework for detecting molecular environment motifs that are enriched or depleted across different regions of the redox potential landscape. We provide open access to all source code and data generated.

[1]  Alán Aspuru-Guzik,et al.  A thermodynamic atlas of carbon redox chemical space , 2019, Proceedings of the National Academy of Sciences.

[2]  Alán Aspuru-Guzik,et al.  Quantum chemistry reveals thermodynamic principles of redox biochemistry , 2018, PLoS Comput. Biol..

[3]  Alán Aspuru-Guzik,et al.  Alkaline Benzoquinone Aqueous Flow Battery for Large‐Scale Storage of Electrical Energy , 2018 .

[4]  Alán Aspuru-Guzik,et al.  Design Principles and Top Non-Fullerene Acceptor Candidates for Organic Photovoltaics , 2017 .

[5]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[6]  Elias W. Krumholz,et al.  Thermodynamic Constraints Improve Metabolic Networks. , 2017, Biophysical journal.

[7]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[8]  John E Herr,et al.  The many-body expansion combined with neural networks. , 2016, The Journal of chemical physics.

[9]  V. Hatzimanikatis,et al.  ATLAS of Biochemistry: A Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies. , 2016, ACS synthetic biology.

[10]  Alán Aspuru-Guzik,et al.  A redox-flow battery with an alloxazine-based organic electrolyte , 2016, Nature Energy.

[11]  Michael P. Marshak,et al.  Anthraquinone Derivatives in Aqueous Flow Batteries , 2016 .

[12]  Sereina Riniker,et al.  Better Informed Distance Geometry: Using What We Know To Improve Conformation Generation , 2015, J. Chem. Inf. Model..

[13]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[14]  C. Bannwarth,et al.  Consistent structures and interactions by density functional theory with small atomic orbital basis sets. , 2015, The Journal of chemical physics.

[15]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[16]  Károly Héberger,et al.  Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? , 2015, Journal of Cheminformatics.

[17]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[18]  Amanda L. Smith,et al.  Computational protein design enables a novel one-carbon assimilation pathway , 2015, Proceedings of the National Academy of Sciences.

[19]  Alán Aspuru-Guzik,et al.  Uncertainty of Prebiotic Scenarios: The Case of the Non-Enzymatic Reverse Tricarboxylic Acid Cycle , 2015, Scientific Reports.

[20]  Daniel M. Lowe,et al.  Development of a Novel Fingerprint for Chemical Reactions and Its Application to Large-Scale Reaction Classification and Similarity , 2015, J. Chem. Inf. Model..

[21]  Alán Aspuru-Guzik,et al.  Quantum Chemical Approach to Estimating the Thermodynamics of Metabolic Reactions , 2014, Scientific Reports.

[22]  S. Kung Kernel Methods and Machine Learning , 2014 .

[23]  Wolfram Liebermeister,et al.  Pathway Thermodynamics Highlights Kinetic Obstacles in Central Metabolism , 2014, PLoS Comput. Biol..

[24]  Michael P. Marshak,et al.  A metal-free organic–inorganic aqueous flow battery , 2014, Nature.

[25]  Sereina Riniker,et al.  Similarity maps - a visualization strategy for molecular fingerprints and machine-learning methods , 2013, Journal of Cheminformatics.

[26]  Stefan Grimme,et al.  Corrected small basis set Hartree‐Fock method for large systems , 2013, J. Comput. Chem..

[27]  Ronan M. T. Fleming,et al.  Consistent Estimation of Gibbs Energy Using Component Contributions , 2013, PLoS Comput. Biol..

[28]  R. Milo,et al.  Glycolytic strategy as a tradeoff between energy yield and protein cost , 2013, Proceedings of the National Academy of Sciences.

[29]  M. Schwab,et al.  Cytochrome P450 enzymes in drug metabolism: regulation of gene expression, enzyme activities, and impact of genetic variation. , 2013, Pharmacology & therapeutics.

[30]  Frank Neese,et al.  An efficient and near linear scaling pair natural orbital based local coupled cluster method. , 2013, The Journal of chemical physics.

[31]  James J. P. Stewart,et al.  Optimization of parameters for semiempirical methods VI: more modifications to the NDDO approximations and re-optimization of parameters , 2012, Journal of Molecular Modeling.

[32]  R. Milo,et al.  Thermodynamic constraints shape the structure of carbon fixation pathways. , 2012, Biochimica et biophysica acta.

[33]  Yaniv Lubling,et al.  An integrated open framework for thermodynamics of reactions that combines accuracy and coverage , 2012, Bioinform..

[34]  Jan M. L. Martin,et al.  DSD-PBEP86: in search of the best double-hybrid DFT with spin-component scaled MP2 and dispersion corrections. , 2011, Physical chemistry chemical physics : PCCP.

[35]  Ron Milo,et al.  eQuilibrator—the biochemical thermodynamics calculator , 2011, Nucleic Acids Res..

[36]  A. Cornish-Bowden,et al.  Recommendations for terminology and databases for biochemical thermodynamics. , 2011, Biophysical chemistry.

[37]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[38]  R. Milo,et al.  Design and analysis of synthetic carbon fixation pathways , 2010, Proceedings of the National Academy of Sciences.

[39]  I. Feussner,et al.  Oxylipins: structurally diverse metabolites from fatty acid oxidation. , 2009, Plant physiology and biochemistry : PPB.

[40]  Matthew D. Jankowski,et al.  Group contribution method for thermodynamic analysis of complex metabolic networks. , 2008, Biophysical journal.

[41]  V. Hatzimanikatis,et al.  Thermodynamics-based metabolic flux analysis. , 2007, Biophysical journal.

[42]  D. Segrè,et al.  Supporting Online Material Materials and Methods Tables S1 and S2 References the Effect of Oxygen on Biochemical Networks and the Evolution of Complex Life , 2022 .

[43]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[44]  A. Siraki,et al.  Aldehyde Sources, Metabolism, Molecular Toxicity Mechanisms, and Possible Effects on Human Health , 2005, Critical reviews in toxicology.

[45]  Robert N. Goldberg,et al.  Thermodynamics of enzyme-catalyzed reactions - a database for quantitative biochemistry , 2004, Bioinform..

[46]  S. Fujioka,et al.  Biosynthesis and Metabolism of Brassinosteroids , 2003 .

[47]  Gregory S. Tschumper,et al.  PREDICTING ELECTRON AFFINITIES WITH DENSITY FUNCTIONAL THEORY: SOME POSITIVE RESULTS FOR NEGATIVE IONS , 1997 .

[48]  F Darvas,et al.  Prediction of distribution coefficient from structure. 1. Estimation method. , 1997, Journal of pharmaceutical sciences.

[49]  Peter C. Jurs,et al.  Estimation of pKa for organic oxyacids using calculated atomic charges , 1993, J. Comput. Chem..

[50]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[51]  M. Mavrovouniotis Estimation of standard Gibbs energy changes of biotransformations. , 1991, The Journal of biological chemistry.

[52]  M. Mavrovouniotis Group contributions for estimating standard gibbs energies of formation of biochemical compounds in aqueous solution , 1990, Biotechnology and bioengineering.

[53]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[54]  Michel Boudart,et al.  Thermochemical kinetics, 2nd. Ed., Sidney W. Benson, Wiley Interscience, 320 pp., $22.50, New York, 1976 , 1977 .

[55]  S. Benson,et al.  Additivity Rules for the Estimation of Molecular Properties. Thermodynamic Properties , 1958 .

[56]  H. Krebs,et al.  The free-energy changes associated with the individual steps of the tricarboxylic acid cycle, glycolysis and alcoholic fermentation and with the hydrolysis of the pyrophosphate groups of adenosinetriphosphate. , 1953, The Biochemical journal.

[57]  V. Hatzimanikatis,et al.  Thermodynamics-based Metabolite Sensitivity Analysis in metabolic networks. , 2017, Metabolic engineering.

[58]  J. Simons,et al.  Molecular anions. , 2008, The journal of physical chemistry. A.

[59]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..