Optimizing Chemical Reactions with Deep Reinforcement Learning

Deep reinforcement learning was employed to optimize chemical reactions. Our model iteratively records the results of a chemical reaction and chooses new experimental conditions to improve the reaction outcome. This model outperformed a state-of-the-art blackbox optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, we introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on regret from 0.062 to 0.039 compared with a deterministic policy. Combining the efficient exploration policy with accelerated microdroplet reactions, optimal reaction conditions were determined in 30 min for the four reactions considered, and a better understanding of the factors that control microdroplet reactions was reached. Moreover, our model showed a better performance after training on reactions with similar or even dissimilar underlying mechanisms, which demonstrates its learning ability.

[1]  Hong Gil Nam,et al.  Microdroplet fusion mass spectrometry for fast reaction kinetics , 2015, Proceedings of the National Academy of Sciences.

[2]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[3]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[4]  Klavs F. Jensen,et al.  Suzuki–Miyaura cross-coupling optimization enabled by automated feedback , 2016, Reaction chemistry & engineering.

[5]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[6]  Magnus Rueping,et al.  Self‐Optimizing Reactor Systems: Algorithms, On‐Line Analytics, Setups, and Strategies for Accelerating Continuous Flow Process Optimization , 2014 .

[7]  S. Iravani,et al.  Synthesis of silver nanoparticles: chemical, physical and biological methods , 2014, Research in pharmaceutical sciences.

[8]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[9]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[10]  Richard N Zare,et al.  Syntheses of Isoquinoline and Substituted Quinolines in Charged Microdroplets. , 2015, Angewandte Chemie.

[11]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[12]  Richard N. Zare,et al.  Acceleration of reaction in charged microdroplets , 2015, Quarterly Reviews of Biophysics.

[13]  Geoffrey R Akien,et al.  Online quantitative mass spectrometry for the rapid adaptive optimisation of automated flow reactors , 2016 .

[14]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[15]  Robert L. Woodward,et al.  Self-optimisation of the final stage in the synthesis of EGFR kinase inhibitor AZD9291 using an automated flow reactor , 2016 .

[16]  Claudio Battilocchio,et al.  A Novel Internet-Based Reaction Monitoring, Control and Autonomous Self-Optimization Platform for Chemical Synthesis , 2015 .

[17]  Klavs F. Jensen,et al.  Suzuki–Miyaura cross-coupling optimization enabled by automated feedback , 2016 .

[18]  Klavs F Jensen,et al.  Integrated microreactors for reaction automation: new approaches to reaction development. , 2010, Annual review of analytical chemistry.

[19]  R. Tibshirani,et al.  Molecular assessment of surgical-resection margins of gastric cancer by mass-spectrometric imaging , 2014, Proceedings of the National Academy of Sciences.

[20]  Charlotte Truchet,et al.  Optimizing the Heck–Matsuda Reaction in Flow with a Constraint-Adapted Direct Search Algorithm , 2016 .

[21]  Richard N. Zare,et al.  Can all bulk-phase reactions be accelerated in microdroplets? , 2017, The Analyst.

[22]  Richard N. Zare,et al.  Abiotic production of sugar phosphates and uridine ribonucleoside in aqueous microdroplets , 2017, Proceedings of the National Academy of Sciences.

[23]  Martyn Poliakoff,et al.  Self-optimizing continuous reactions in supercritical carbon dioxide. , 2011, Angewandte Chemie.

[24]  Arnold Neumaier,et al.  SNOBFIT -- Stable Noisy Optimization by Branch and Fit , 2008, TOMS.

[25]  Zhenpeng Zhou,et al.  Personal Information from Latent Fingerprints Using Desorption Electrospray Ionization Mass Spectrometry and Machine Learning. , 2017, Analytical chemistry.

[26]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Benjamin Van Roy,et al.  Generalization and Exploration via Randomized Value Functions , 2014, ICML.

[29]  P. Seeberger,et al.  The Hitchhiker's Guide to Flow Chemistry ∥. , 2017, Chemical reviews.

[30]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[31]  Lloyd S. Nelson,et al.  Nelder‐Mead Simplex Method , 2006 .

[32]  Klavs F Jensen,et al.  An integrated microreactor system for self-optimization of a Heck reaction: from micro- to mesoscale flow systems. , 2010, Angewandte Chemie.

[33]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[34]  A. deMello,et al.  Intelligent routes to the controlled synthesis of nanoparticles. , 2007, Lab on a chip.

[35]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[36]  Alexei Lapkin,et al.  Self-optimisation and model-based design of experiments for developing a C–H activation flow process , 2017, Beilstein journal of organic chemistry.

[37]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[38]  Paul Raccuglia,et al.  Machine-learning-assisted materials discovery using failed experiments , 2016, Nature.

[39]  R. M. Bain,et al.  Organic Reactions in Microdroplets: Reaction Acceleration Revealed by Mass Spectrometry. , 2016, Angewandte Chemie.

[40]  Alán Aspuru-Guzik,et al.  Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space , 2017, ICML.

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.