Scalar Coupling Constant Prediction Using Graph Embedding Local Attention Encoder

Scalar coupling constant (SCC) plays a key role in the analysis of three-dimensional structure of organic matter, however, the traditional SCC prediction using quantum mechanical calculations is very time-consuming. To calculate SCC efficiently and accurately, we proposed a graph embedding local self-attention encoder (GELAE) model, in which, a novel invariant structure representation of the coupling system in terms of bond length, bond angle and dihedral angle was presented firstly, and then a local self-attention module embedded with the adjacent matrix of a graph was designed to extract effectively the features of coupling systems, finally, with a modified classification loss function, the SCC was predicted. To validate the superiority of the proposed method, we conducted a series of comparison experiments using different structure representations, different attention modules, and different losses. The experimental results demonstrate that, compared to the traditional chemical bond structure representations, the rotation and translation invariant structure representations proposed in this work can improve the SCC prediction accuracy; with the graph embedded local self-attention, the mean absolute error (MAE) of the prediction model in the validation set decreases from 0.1603 Hz to 0.1067 Hz; using the classification based loss function instead of the scaled regression loss, the MAE of the predicted SCC can be decreased to 0.0963 HZ, which is close to the quantum chemistry standard on CHAMPS dataset.

[1]  J. D. Lee,et al.  Interpretation of mass spectra. , 1973, Talanta.

[2]  Malcolm E. Rose,et al.  Interpretation of mass spectra, 4th edition F. W. McLAFFERTY AND F. TUREČEK Published by University Science Books, Mill Valley, 1993 ISBN 0‐935702‐25‐3, xiii + 371 pp. , 1994 .

[3]  Jiamin Liu,et al.  Local Geometric Structure Feature for Dimensionality Reduction of Hyperspectral Imagery , 2017, Remote. Sens..

[4]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[5]  M. Ratner Molecular electronic-structure theory , 2000 .

[6]  van den Berg,et al.  UvA-DARE (Digital Academic Modeling Relational Data with Graph Convolutional Networks Modeling Relational Data with Graph Convolutional Networks , 2017 .

[7]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[8]  Igor V. Tetko,et al.  A Transformer Model for Retrosynthesis , 2019, ICANN.

[9]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[10]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[11]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[12]  M. Rupp,et al.  Machine Learning for Quantum Mechanical Properties of Atoms in Molecules , 2015, 1505.00350.

[13]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[14]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[15]  F. McLafferty Interpretation of Mass Spectra , 1966 .

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[18]  Tong Zhang,et al.  Si-GCN: Structure-induced Graph Convolution Network for Skeleton-based Action Recognition , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Chris J Pickard,et al.  Ab Initio Quality NMR Parameters in Solid-State Materials Using a High-Dimensional Neural-Network Representation. , 2016, Journal of chemical theory and computation.

[21]  David R. Glowacki,et al.  IMPRESSION – prediction of NMR parameters for 3-dimensional chemical structures using machine learning with near quantum chemical accuracy† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc03854j , 2019, Chemical science.

[22]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[23]  Jonathan M. Goodman,et al.  DP4-AI automated NMR data analysis: straight from spectrometer to structure , 2020 .

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Bo Du,et al.  Feature Learning Using Spatial-Spectral Hypergraph Discriminant Analysis for Hyperspectral Image , 2019, IEEE Transactions on Cybernetics.

[26]  J. Elguero,et al.  Review on DFT and ab initio Calculations of Scalar Coupling Constants , 2003 .

[27]  Yu Liu,et al.  T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction , 2018, IEEE Transactions on Intelligent Transportation Systems.

[28]  Gong Zhang,et al.  GCN-GAN: A Non-linear Temporal Link Prediction Model for Weighted Dynamic Networks , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[29]  Zhang Yi,et al.  Discriminative globality and locality preserving graph embedding for dimensionality reduction , 2020, Expert Syst. Appl..

[30]  Jie Nie,et al.  M-GCN: Multi-Branch Graph Convolution Network for 2D Image-based on 3D Model Retrieval , 2021, IEEE Transactions on Multimedia.

[31]  Michele Ceriotti,et al.  Chemical shifts in molecular solids by machine learning , 2018, Nature Communications.

[32]  J. Goodman,et al.  DP4-AI automated NMR data analysis: straight from spectrometer to structure† , 2020, Chemical science.