A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility

Efficient and accurate prediction of molecular properties, such as lipophilicity and solubility, is highly desirable for rational compound design in chemical and pharmaceutical industries. To this end, we build and apply a graph-neural-network framework called self-attention-based message-passing neural network (SAMPN) to study the relationship between chemical properties and structures in an interpretable way. The main advantages of SAMPN are that it directly uses chemical graphs and breaks the black-box mold of many machine/deep learning methods. Specifically, its attention mechanism indicates the degree to which each atom of the molecule contributes to the property of interest, and these results are easily visualized. Further, SAMPN outperforms random forests and the deep learning framework MPN from Deepchem. In addition, another formulation of SAMPN (Multi-SAMPN) can simultaneously predict multiple chemical properties with higher accuracy and efficiency than other models that predict one specific chemical property. Moreover, SAMPN can generate chemically visible and interpretable results, which can help researchers discover new pharmaceuticals and materials. The source code of the SAMPN prediction pipeline is freely available at Github ( https://github.com/tbwxmu/SAMPN ).

[1]  S. Mukhopadhyay,et al.  QSAR STUDIES OF FabH INHIBITORS USING GRAPH THEORETICAL & QUANTUM CHEMICAL DESCRIPTORS , 2016 .

[2]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[3]  Yoshihiro Yamanishi,et al.  Mining Discriminative Patterns from Graph Data with Multiple Labels and Its Application to Quantitative Structure-Activity Relationship (QSAR) Models , 2015, J. Chem. Inf. Model..

[4]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[5]  Shuang Wang,et al.  Molecule Property Prediction Based on Spatial Graph Embedding , 2019, J. Chem. Inf. Model..

[6]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[7]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[8]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[9]  Travis E. Oliphant,et al.  Python for Scientific Computing , 2007, Computing in Science & Engineering.

[10]  Vijay S. Pande,et al.  Step Change Improvement in ADMET Prediction with PotentialNet Deep Featurization , 2019, ArXiv.

[11]  Frank R Burden,et al.  Quantitative structure-property relationship modeling of diverse materials properties. , 2012, Chemical reviews.

[12]  Igor V. Tetko,et al.  Estimation of Aqueous Solubility of Chemical Compounds Using E-State Indices , 2001, J. Chem. Inf. Comput. Sci..

[13]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[14]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[15]  Igor V. Tetko,et al.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information , 2011, J. Comput. Aided Mol. Des..

[16]  Xin Yan,et al.  DeepChemStable: Chemical Stability Prediction with an Attention-Based Graph Convolution Network , 2019, J. Chem. Inf. Model..

[17]  Chris Eliasmith,et al.  Hyperopt: a Python library for model selection and hyperparameter optimization , 2015 .

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[20]  Sereina Riniker,et al.  Open-source platform to benchmark fingerprints for ligand-based virtual screening , 2013, Journal of Cheminformatics.

[21]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[22]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[23]  Pavel Polishchuk,et al.  Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future , 2017, J. Chem. Inf. Model..

[24]  Dmitri B. Kireev,et al.  ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping , 1995, J. Chem. Inf. Comput. Sci..

[25]  Thomas Blaschke,et al.  Molecular de-novo design through deep reinforcement learning , 2017, Journal of Cheminformatics.

[26]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[27]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[30]  S. Planey,et al.  The influence of lipophilicity in drug discovery and design , 2012, Expert opinion on drug discovery.

[31]  Lei Jia,et al.  Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction , 2018, International journal of molecular sciences.

[32]  Junichiro Shiomi,et al.  Designing Nanostructures for Phonon Transport via Bayesian Optimization , 2016, 1609.04972.

[33]  A Goulon,et al.  Predicting activities without computing descriptors: graph machines for QSAR , 2007, SAR and QSAR in environmental research.

[34]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.