CoAtGIN: Marrying Convolution and Attention for Graph-based Molecule Property Prediction

Molecule property prediction based on computational strategy plays a key role in the process of drug discovery and design, such as DFT. Yet, these traditional methods are time-consuming and labour-intensive, which can’t satisfy the need of biomedicine. Thanks to the development of deep learning, there are many variants of Graph Neural Networks (GNN) for molecule representation learning. However, whether the existed well-perform graph-based methods have a number of parameters, or the light models can’t achieve good grades on various tasks. In order to manage the trade-off between efficiency and performance, we propose a novel model architecture, CoAtGIN, using both Convolution and Attention. On the local level, k-hop convolution is designed to capture long-range neighbour information. On the global level, besides using the virtual node to pass identical messages, we utilize linear attention to aggregate global graph representation according to the importance of each node and edge. In the recent OGB Large-Scale Benchmark, CoAtGIN achieves the 0.0933 Mean Absolute Error (MAE) on the large-scale dataset PCQM4Mv2 with only 5.6 M model parameters. Moreover, using the linear attention block improves the performance, which helps to capture the global representation.

[1]  Seunghoon Hong,et al.  Pure Transformers are Powerful Graph Learners , 2022, NeurIPS.

[2]  Junjie Yan,et al.  cosFormer: Rethinking Softmax in Attention , 2022, ICLR.

[3]  C. Schmeck,et al.  Small molecules and their impact in drug discovery: a perspective on the occasion of the 125th anniversary of the Bayer Chemical Research Laboratory. , 2022, Drug discovery today.

[4]  Seung-won Hwang,et al.  GRPE: Relative Positional Encoding for Graph Transformer , 2022, 2201.12787.

[5]  Yuxiao Wang,et al.  Structure-Based Protein-Drug Affinity Prediction with Spatial Attention Mechanisms , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Xuefeng Cui,et al.  Edge-Gated Graph Neural Network for Predicting Protein-Ligand Binding Affinities , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Yuxiao Wang,et al.  rzMLP-DTA: gMLP network with ReZero for sequence-based drug-target affinity prediction , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Hua Wu,et al.  Geometry-enhanced molecular representation learning for property prediction , 2021, Nature Machine Intelligence.

[9]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[10]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[11]  Bharath Ramsundar,et al.  ChemBERTa: Large-Scale Self-Supervised Pretraining for Molecular Property Prediction , 2020, ArXiv.

[12]  Bernard Ghanem,et al.  DeeperGCN: All You Need to Train Deeper GCNs , 2020, ArXiv.

[13]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[14]  Junzhou Huang,et al.  SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction , 2019, BCB.

[15]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[16]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Niroshini Nirmalan,et al.  “Omics”-Informed Drug and Biomarker Discovery: Opportunities, Challenges and Future Perspectives , 2016, Proteomes.

[21]  Tudor I. Oprea,et al.  The significance of acid/base properties in drug discovery. , 2013, Chemical Society reviews.

[22]  K. Burke Perspective on density functional theory. , 2012, The Journal of chemical physics.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  C. Öǧretir,et al.  Quantum chemical studies of some pyridine derivatives as corrosion inhibitors , 1999 .

[25]  Mohammed J. Zaki,et al.  Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs , 2021, ArXiv.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .