Molecular property prediction by semantic-invariant contrastive learning

Contrastive learning have been widely used as pretext tasks for self-supervised pre-trained molecular representation learning models in AI-aided drug design and discovery. However, exiting methods that generate molecular views by noise-adding operations for contrastive learning may face the semantic inconsistency problem, which leads to false positive pairs and consequently poor prediction performance. To address this problem, in this paper we first propose a semantic-invariant view generation method by properly breaking molecular graphs into fragment pairs. Then, we develop a Fragment-based Semantic-Invariant Contrastive Learning (FraSICL) model based on this view generation method for molecular property prediction. The FraSICL model consists of two branches to generate representations of views for contrastive learning, meanwhile a multi-view fusion and an auxiliary similarity loss are introduced to make better use of the information contained in different fragment-pair views. Extensive experiments on various benchmark datasets show that with the least number of pre-training samples, FraSICL can achieve state-of-the-art performance, compared with major existing counterpart models.

[1]  Jianyang Zeng,et al.  KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction , 2022, KDD.

[2]  Rickard Brüel Gabrielsson,et al.  R EWIRING WITH P OSITIONAL E NCODINGS FOR GNN S , 2022 .

[3]  Francesco Di Giovanni,et al.  Understanding over-squashing and bottlenecks on graphs via curvature , 2021, ICLR.

[4]  Tao Qin,et al.  Dual-view Molecule Pre-training , 2021, ArXiv.

[5]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[6]  Bingbing Ni,et al.  Self-supervised Graph-level Representation Learning with Local and Global Structure , 2021, ICML.

[7]  Jihong Guan,et al.  FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction , 2021, Bioinform..

[8]  Amir Barati Farimani,et al.  Molecular contrastive learning of representations via graph neural networks , 2021, Nature Machine Intelligence.

[9]  Jongsuk R. Lee,et al.  A merged molecular representation learning for molecular properties prediction with a web-based service , 2020, Scientific Reports.

[10]  Benjamin A. Shoemaker,et al.  PubChem in 2021: new data content and improved web interfaces , 2020, Nucleic Acids Res..

[11]  Fillia Makedon,et al.  A Survey on Contrastive Self-supervised Learning , 2020, Technologies.

[12]  Yurii S Moroz,et al.  ZINC20 - A Free Ultralarge-Scale Chemical Database for Ligand Discovery , 2020, J. Chem. Inf. Model..

[13]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[14]  Yuedong Yang,et al.  Communicative Representation Learning on Attributed Molecular Graphs , 2020, IJCAI.

[15]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[16]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[17]  Eran Yahav,et al.  On the Bottleneck of Graph Neural Networks and its Practical Implications , 2020, ICLR.

[18]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[19]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  J. Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2019, ICLR.

[21]  Sabrina Jaeger,et al.  Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition , 2018, J. Chem. Inf. Model..

[22]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[23]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[24]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[25]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[26]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[27]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[28]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[29]  John H. Fletcher,et al.  The Nomenclature of Organic Chemistry , 1967 .

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.