Applying machine learning techniques to predict the properties of energetic materials

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with ≈300 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

[1]  Sanguthevar Rajasekaran,et al.  Accelerating materials property predictions using machine learning , 2013, Scientific Reports.

[2]  Vinícius Gonçalves Maltarollo,et al.  Applying machine learning techniques for ADME-Tox prediction: a review , 2015, Expert opinion on drug metabolism & toxicology.

[3]  Patricia Rotureau,et al.  Development of validated QSPR models for impact sensitivity of nitroaliphatic compounds. , 2012, Journal of hazardous materials.

[4]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[5]  Girish M. Gore,et al.  DFT study on the structure and explosive properties of nitropyrazoles , 2012 .

[6]  Brian L. DeCost,et al.  Elucidating multi-physics interactions in suspensions for the design of polymeric dispersants: a hierarchical machine learning approach , 2017 .

[7]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[8]  Lou Massa,et al.  DISCOVERY OF ENERGETIC MATERIALS BY A THEORETICAL METHOD (DEMTM) , 2011 .

[9]  P. Rotureau,et al.  Development of a QSPR model for predicting thermal stabilities of nitroaromatic compounds taking into account their decomposition mechanisms , 2011, Journal of molecular modeling.

[10]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[11]  Peter Gedeck,et al.  QSAR - How Good Is It in Practice? Comparison of Descriptor Sets on an Unbiased Cross Section of Corporate Data Sets , 2006, J. Chem. Inf. Model..

[12]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[13]  T. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[14]  Samuel P. Hernández-Rivera,et al.  Predicting Heats of Explosion of Nitroaromatic Compounds through NBO Charges and 15N NMR Chemical Shifts of Nitro Groups , 2012 .

[15]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[16]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[17]  Sergios Theodoridis,et al.  Pattern Recognition, Fourth Edition , 2008 .

[18]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[19]  Oleg Devinyak,et al.  3D-MoRSE descriptors explained. , 2014, Journal of molecular graphics & modelling.

[20]  Ligen Zhu,et al.  QSPR studies of impact sensitivity of nitro energetic compounds using three-dimensional descriptors. , 2012, Journal of molecular graphics & modelling.

[21]  Frank H. Allen,et al.  Cambridge Structural Database , 2002 .

[22]  Junichiro Shiomi,et al.  Designing Nanostructures for Phonon Transport via Bayesian Optimization , 2016, 1609.04972.

[23]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[24]  Dabir S. Viswanath,et al.  Hexanitrohexaazaisowurtzitane (HNIW, CL-20) , 2018 .

[25]  Erin Antono,et al.  Overcoming data scarcity with transfer learning , 2017, ArXiv.

[26]  Klaus Schulten,et al.  A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks , 1996, Neural Computation.

[27]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[28]  A. R. Martin,et al.  Some aspects of detonation. Part 1.—Detonation velocity and chemical constitution , 1958 .

[29]  Lemi Türker,et al.  A First-Order Linear Model for the Estimation of Detonation Velocity , 2011 .

[30]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[31]  F. Hab,et al.  Machine learning exciton dynamics , 2016 .

[32]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[33]  Lemont B. Kier,et al.  An Electrotopological-State Index for Atoms in Molecules , 1990, Pharmaceutical Research.

[34]  William D. Mattson,et al.  Machine Learning of Energetic Material Properties , 2018, 1807.06156.

[35]  Yong Pan,et al.  Prediction of impact sensitivity of nitro energetic compounds by neural network based on electrotopological-state indices. , 2009, Journal of hazardous materials.

[36]  Jian Zhao,et al.  CarcinoPred-EL: Novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods , 2017, Scientific Reports.

[37]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[38]  Selçuk Gümüş,et al.  A DFT Study on Nitro Derivatives of Pyridine , 2010 .

[39]  Thorsten Naumann,et al.  HQSAR: A New, Highly Predictive QSAR Technique , 1997 .

[40]  Michael D. Frenkel,et al.  Quantitative Structure–Property Relationship Predictions of Critical Properties and Acentric Factors for Pure Compounds , 2015 .

[41]  Jane S. Murray,et al.  Chapter One – Detonation Performance and Sensitivity: A Quest for Balance , 2014 .

[42]  Thomas M. Klapötke,et al.  Chemistry of High-Energy Materials , 2011 .

[43]  R. Kondor,et al.  On representing chemical environments , 2012, 1209.3140.

[44]  Z. Deng,et al.  Bridging chemical and biological space: "target fishing" using 2D and 3D molecular descriptors. , 2006, Journal of medicinal chemistry.

[45]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[46]  John E. Herr,et al.  Intrinsic Bond Energies from a Bonds-in-Molecules Neural Network. , 2017, The journal of physical chemistry letters.

[47]  Andreas Ziehe,et al.  Learning Invariant Representations of Molecules for Atomization Energy Prediction , 2012, NIPS.

[48]  Ian A. Watson,et al.  ErG: 2D Pharmacophore Descriptions for Scaffold Hopping , 2006, J. Chem. Inf. Model..

[49]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[50]  Natalio Mingo,et al.  Materials Screening for the Discovery of New Half-Heuslers: Machine Learning versus ab Initio Methods. , 2017, The journal of physical chemistry. B.

[51]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[52]  Andrew P. Chafin,et al.  Synthesis of polyazapolycyclic caged polynitramines , 1998 .

[53]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[54]  Alok Choudhary,et al.  Including crystal structure attributes in machine learning models of formation energies via Voronoi tessellations , 2017 .

[55]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[56]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[57]  S. Curtarolo,et al.  Accelerated discovery of new magnets in the Heusler alloy family , 2017, Science Advances.

[58]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[59]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[60]  Lou Massa,et al.  APPLICATIONS OF ENERGETIC MATERIALS BY A THEORETICAL METHOD (DISCOVER ENERGETIC MATERIALS BY A THEORETICAL METHOD) , 2013 .

[61]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[62]  Roberto Todeschini,et al.  Comparison of Different Approaches to Define the Applicability Domain of QSAR Models , 2012, Molecules.

[63]  Corey Oses,et al.  Machine learning modeling of superconducting critical temperature , 2017, npj Computational Materials.

[64]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[65]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[66]  Justin S. Smith,et al.  Hierarchical modeling of molecular energies using a deep neural network. , 2017, The Journal of chemical physics.

[67]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[68]  Felix A Faber,et al.  Crystal structure representations for machine learning models of formation energies , 2015, 1503.07406.

[69]  Miguel A. L. Marques,et al.  Predicting the Thermodynamic Stability of Solids Combining Density Functional Theory and Machine Learning , 2017 .

[70]  Peter Willett,et al.  Similarity Searching in Databases of Flexible 3D Structures Using Autocorrelation Vectors Derived from Smoothed Bounded Distance Matrices , 2006, J. Chem. Inf. Model..

[71]  D. Mathieu,et al.  Sensitivity of Energetic Materials: Theoretical Relationships to Detonation Performance and Molecular Structure , 2017 .

[72]  B. Rice,et al.  A quantum mechanical investigation of the relation between impact sensitivity and the charge distribution in energetic molecules , 2002 .

[73]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[74]  Svatopluk Zeman,et al.  Sensitivities of High Energy Compounds , 2007 .

[75]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[76]  Kipton Barros,et al.  Learning molecular energies using localized graph kernels. , 2016, The Journal of chemical physics.

[77]  B. D. Conduit,et al.  Design of a nickel-base superalloy using a neural network , 2017, ArXiv.

[78]  M. Rupp,et al.  Fourier series of atomic radial distribution functions: A molecular fingerprint for machine learning models of quantum chemical properties , 2013, 1307.2918.