TrimNet: learning molecular representation from triplet messages for biomedicine

MOTIVATION Computational methods accelerate drug discovery and play an important role in biomedicine, such as molecular property prediction and compound-protein interaction (CPI) identification. A key challenge is to learn useful molecular representation. In the early years, molecular properties are mainly calculated by quantum mechanics or predicted by traditional machine learning methods, which requires expert knowledge and is often labor-intensive. Nowadays, graph neural networks have received significant attention because of the powerful ability to learn representation from graph data. Nevertheless, current graph-based methods have some limitations that need to be addressed, such as large-scale parameters and insufficient bond information extraction. RESULTS In this study, we proposed a graph-based approach and employed a novel triplet message mechanism to learn molecular representation efficiently, named triplet message networks (TrimNet). We show that TrimNet can accurately complete multiple molecular representation learning tasks with significant parameter reduction, including the quantum properties, bioactivity, physiology and CPI prediction. In the experiments, TrimNet outperforms the previous state-of-the-art method by a significant margin on various datasets. Besides the few parameters and high prediction accuracy, TrimNet could focus on the atoms essential to the target properties, providing a clear interpretation of the prediction tasks. These advantages have established TrimNet as a powerful and useful computational tool in solving the challenging problem of molecular representation learning. AVAILABILITY The quantum and drug datasets are available on the website of MoleculeNet: http://moleculenet.ai. The source code is available in GitHub: https://github.com/yvquanli/trimnet. CONTACT xjyao@lzu.edu.cn, songsen@tsinghua.edu.cn.

[1]  G. Hessler,et al.  Artificial Intelligence in Drug Design , 2018, Molecules.

[2]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[3]  P. Hohenberg,et al.  Inhomogeneous Electron Gas , 1964 .

[4]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[5]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[6]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[7]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[8]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[9]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[10]  J. Nørskov,et al.  Towards the computational design of solid catalysts. , 2009, Nature chemistry.

[11]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[12]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Jun Sese,et al.  Compound‐protein interaction prediction with end‐to‐end learning of neural networks for graphs and sequences , 2018, Bioinform..

[15]  Guido Zuccon,et al.  Fixed-Cost Pooling Strategies , 2019, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[17]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[18]  Volkan Atalay,et al.  Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases , 2018, Briefings Bioinform..

[19]  D. Rogers,et al.  Using Extended-Connectivity Fingerprints with Laplacian-Modified Bayesian Analysis in High-Throughput Screening Follow-Up , 2005, Journal of biomolecular screening.

[20]  Lei Jia,et al.  Chemi-Net: A Molecular Graph Convolutional Network for Accurate Drug Property Prediction , 2018, International journal of molecular sciences.

[21]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[22]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[23]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[24]  Vijay S. Pande,et al.  Molecular dynamics simulations reveal ligand-controlled positioning of a peripheral protein complex in membranes , 2017, Nature Communications.

[25]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[26]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  M. Withnall,et al.  Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction , 2020, Journal of Cheminformatics.

[29]  Hui Liu,et al.  Improving compound–protein interaction prediction by building up highly credible negative samples , 2015, Bioinform..

[30]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Seongok Ryu,et al.  A Bayesian graph convolutional network for reliable prediction of molecular properties with uncertainty quantification , 2019, Chemical science.

[32]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[33]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[34]  Adam C Mater,et al.  Deep Learning in Chemistry , 2019, J. Chem. Inf. Model..

[35]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[36]  Zhen Wu,et al.  A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility , 2020, Journal of Cheminformatics.

[37]  Xiangrong Liu,et al.  Machine Learning for Drug-Target Interaction Prediction , 2018, Molecules.

[38]  A. Lavecchia Deep learning in drug discovery: opportunities, challenges and future prospects. , 2019, Drug discovery today.

[39]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[40]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[41]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[42]  Leman Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2020, ICLR.

[43]  Ali Masoudi-Nejad,et al.  Drug–target interaction prediction via chemogenomic space: learning-based methods , 2014, Expert opinion on drug metabolism & toxicology.

[44]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[45]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[46]  Alpha A. Lee,et al.  Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning , 2019, Chemical science.

[47]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.