Spectral deep learning for prediction and prospective validation of functional groups

State-of-the-art identification of the functional groups present in an unknown chemical entity requires the expertise of a skilled spectroscopist to analyse and interpret Fourier transform infra-red (FTIR), mass spectroscopy (MS) and/or nuclear magnetic resonance (NMR) data. This process can be time-consuming and error-prone, especially for complex chemical entities that are poorly characterised in the literature, or inefficient to use with synthetic robots producing molecules at an accelerated rate. Herein, we introduce a fast, multi-label deep neural network for accurately identifying all the functional groups of unknown compounds using a combination of FTIR and MS spectra. We do not use any database, pre-established rules, procedures, or peak-matching methods. Our trained neural network reveals patterns typically used by human chemists to identify standard groups. Finally, we experimentally validated our neural network, trained on single compounds, to predict functional groups in compound mixtures. Our methodology showcases practical utility for future use in autonomous analytical detection.

[1]  Kyle C. Doty,et al.  Forensic Hair Differentiation Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy , 2016, Applied spectroscopy.

[2]  Yuemin Bian,et al.  Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era , 2018, The AAPS Journal.

[3]  Jennifer Griffiths,et al.  A brief history of mass spectrometry. , 2008, Analytical chemistry.

[4]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[5]  Alan K Jarmusch,et al.  Multiple reaction monitoring (MRM)-profiling for biomarker discovery applied to human polycystic ovarian syndrome. , 2017, Rapid communications in mass spectrometry : RCM.

[6]  Morton E. Munk,et al.  A neural network approach to infrared spectrum interpretation , 1990 .

[7]  Cheng Wang,et al.  Improving scoring‐docking‐screening powers of protein–ligand scoring functions using random forest , 2017, J. Comput. Chem..

[8]  Hirohisa Yoshida,et al.  Effect of organic functional groups on the phase transition of organic liquids in silica mesopores , 2016, Journal of Thermal Analysis and Calorimetry.

[9]  Bennett D. Marshall,et al.  A PC-SAFT model for hydrocarbons II: General model development , 2018, Fluid Phase Equilibria.

[10]  H. Kolb,et al.  The growing impact of click chemistry on drug discovery. , 2003, Drug discovery today.

[11]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[12]  Shibdas Banerjee,et al.  Electrospray Ionization Mass Spectrometry: A Technique to Access the Information beyond the Molecular Weight of the Analyte , 2011, International journal of analytical chemistry.

[13]  W. S. Hopkins,et al.  Applying Machine Learning to Vibrational Spectroscopy. , 2018, The journal of physical chemistry. A.

[14]  Christopher N. Bowman,et al.  Relative reactivity and selectivity of vinyl sulfones and acrylates towards the thiol–Michael addition reaction and polymerization , 2013 .

[15]  Leroy Cronin,et al.  Controlling an organic synthesis robot with machine learning to search for new reactivity , 2018, Nature.

[16]  R. Fessenden,et al.  Identifying functional groups in IR spectra using an artificial neural network , 1991 .

[17]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[18]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[19]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[20]  Helmut Schwarz,et al.  Gas-phase chemistry of collisionally activated ions , 1983 .

[21]  Derek T. Ahneman,et al.  Predicting reaction performance in C–N cross-coupling using machine learning , 2018, Science.

[22]  Vera L S Freitas,et al.  Influence of Hydroxyl Functional Group on the Structure and Stability of Xanthone: A Computational Approach , 2018, Molecules.

[23]  Rohit Bhargava,et al.  Using Fourier transform IR spectroscopy to analyze biological materials , 2014, Nature Protocols.

[24]  Qing-You Zhang,et al.  Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals , 2017, J. Chem. Inf. Model..

[25]  Sebastian Böcker,et al.  Mining molecular structure databases: Identification of small molecules based on fragmentation mass spectrometry data. , 2017, Mass spectrometry reviews.

[26]  S. Joshua Swamidass,et al.  Modeling Reactivity to Biological Macromolecules with a Deep Multitask Network , 2016, ACS central science.

[27]  Sergey Nikolenko,et al.  druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico. , 2017, Molecular pharmaceutics.

[28]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[29]  Tobias Schulze,et al.  SPLASH, a hashed identifier for mass spectra , 2016, Nature Biotechnology.

[30]  P. Gates,et al.  Characterisation of Flavonoid Aglycones by Negative Ion Chip-Based Nanospray Tandem Mass Spectrometry , 2012, International journal of analytical chemistry.

[31]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[32]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[33]  D. Brynn Hibbert,et al.  A comparative study of point-to-point algorithms for matching spectra , 2006 .

[34]  Emma L. Schymanski,et al.  Identifying small molecules via high resolution mass spectrometry: communicating confidence. , 2014, Environmental science & technology.

[35]  Sylvio Barbon Junior,et al.  Machine Learning Applied to Near-Infrared Spectra for Chicken Meat Classification , 2018, Journal of Spectroscopy.

[36]  S. Joshua Swamidass,et al.  Site of reactivity models predict molecular reactivity of diverse chemicals with glutathione. , 2015, Chemical research in toxicology.

[37]  S. Materazzi,et al.  Early detection of emerging street drugs by near infrared spectroscopy and chemometrics. , 2016, Talanta.

[38]  S. Kazarian,et al.  Infrared spectroscopy and spectroscopic imaging in forensic science. , 2017, The Analyst.

[39]  K. Gilany,et al.  Metabolomics: a state‐of‐the‐art technology for better understanding of male infertility , 2016, Andrologia.

[40]  R. March An Introduction to Quadrupole Ion Trap Mass Spectrometry , 1997 .

[41]  I. Tetko,et al.  Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective , 2017, ChemMedChem.

[42]  S. Böcker,et al.  Searching molecular structure databases with tandem mass spectra using CSI:FingerID , 2015, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[45]  Masanori Arita,et al.  Identification of small molecules using accurate mass MS/MS search. , 2018, Mass spectrometry reviews.

[46]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[47]  J. Zeng,et al.  Prediction of boiling points of organic compounds by QSPR tools. , 2013, Journal of molecular graphics & modelling.