Graph-based Molecular Representation Learning

Molecular representation learning (MRL) is a key step to build the connection between machine learning and chemical science. In particular, it encodes molecules as numerical vectors preserving the molecular structures and features, on top of which the downstream tasks (e.g., property prediction) can be performed. Recently, MRL has achieved considerable progress, especially in methods based on deep molecular graph learning. In this survey, we systematically review these graph-based molecular representation techniques, especially the methods incorporating chemical domain knowledge. Specifically, we first introduce the features of 2D and 3D molecular graphs. Then we summarize and categorize MRL methods into three groups based on their input. Furthermore, we discuss some typical chemical applications supported by MRL. To facilitate studies in this fast-developing area, we also list the benchmarks and commonly used datasets in the paper. Finally, we share our thoughts on future research directions.

[1]  N. Chawla,et al.  Boosting Graph Neural Networks via Adaptive Knowledge Distillation , 2022, AAAI.

[2]  Tie-Yan Liu,et al.  Unified 2D and 3D Pre-Training of Molecular Representations , 2022, KDD.

[3]  Shuiwang Ji,et al.  ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs , 2022, NeurIPS.

[4]  Meng Jiang,et al.  Graph Rationalization with Environment-based Augmentations , 2022, KDD.

[5]  Minghai Qin,et al.  Molecular Contrastive Learning with Chemical Element Knowledge Graph , 2021, AAAI.

[6]  P. Lio’,et al.  3D Infomax improves GNNs for Molecular Property Prediction , 2021, ICML.

[7]  Shengchao Liu,et al.  Pre-training Molecular Graph Representation with 3D Geometry , 2021, ICLR.

[8]  Dejing Dou,et al.  GeomGCL: Geometric Graph Contrastive Learning for Molecular Property Prediction , 2021, AAAI.

[9]  Jiawei Han,et al.  Chemical-Reaction-Aware Molecule Representation Learning , 2021, ICLR.

[10]  James B. Brown,et al.  Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery , 2021, ICLR.

[11]  Marc Brockschmidt,et al.  Learning to Extend Molecular Scaffolds with Structural Motifs , 2021, ICLR.

[12]  S. Ji,et al.  Spherical Message Passing for 3D Molecular Graphs , 2022, ICLR.

[13]  Guojie Song,et al.  Deep Molecular Representation Learning via Fusing Physical and Chemical Information , 2021, NeurIPS.

[14]  Stephan Günnemann,et al.  Directional Message Passing on Molecular Graphs via Synthetic Coordinates , 2021, NeurIPS.

[15]  Chee-Kong Lee,et al.  Motif-based Graph Self-Supervised Learning for Molecular Property Prediction , 2021, NeurIPS.

[16]  Jianliang Gao,et al.  MDNN: A Multimodal Deep Neural Network for Predicting Drug-Drug Interaction Events , 2021, IJCAI.

[17]  Abulikemu Abuduweili,et al.  Property-Aware Relation Networks for Few-Shot Molecular Property Prediction , 2021, NeurIPS.

[18]  Florian Becker,et al.  GemNet: Universal Directional Graph Neural Networks for Molecules , 2021, NeurIPS.

[19]  Jian Tang,et al.  An End-to-End Framework for Molecular Conformation Generation via Bilevel Programming , 2021, ICML.

[20]  Djork-Arn'e Clevert,et al.  Improving Molecular Graph Neural Network Explainability with Orthonormalization and Induced Sparsity , 2021, ICML.

[21]  Jimeng Sun,et al.  SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations , 2021, IJCAI.

[22]  Nitesh V. Chawla,et al.  Few-Shot Graph Learning for Molecular Property Prediction , 2021, WWW.

[23]  Jiayu Zhou,et al.  MoCL: Contrastive Learning on Molecular Graphs with Multi-level Domain Knowledge , 2021, ArXiv.

[24]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[25]  Nitesh Chawla,et al.  GraSeq: Graph and Sequence Fusion Learning for Molecular Property Prediction , 2020, CIKM.

[26]  George Karypis,et al.  Heterogeneous Molecular Graph Neural Networks for Predicting Molecule Properties , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[27]  Xiangxiang Zeng,et al.  KGNN: Knowledge Graph Neural Network for Drug-Drug Interaction Prediction , 2020, IJCAI.

[28]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[29]  Fabian B. Fuchs,et al.  SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks , 2020, NeurIPS.

[30]  Suhang Wang,et al.  Self-supervised Learning on Graphs: Deep Insights and New Direction , 2020, ArXiv.

[31]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[32]  Zhen Wu,et al.  A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility , 2020, Journal of Cheminformatics.

[33]  Regina Barzilay,et al.  Multi-Objective Molecule Generation using Interpretable Substructures , 2020, ICML.

[34]  T. Jaakkola,et al.  Hierarchical Generation of Molecular Graphs using Structural Motifs , 2020, ICML.

[35]  Weinan Zhang,et al.  GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation , 2020, ICLR.

[36]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[37]  Marc Brockschmidt,et al.  GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation , 2019, ICML.

[38]  Pascal Friederich,et al.  Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation , 2019, Mach. Learn. Sci. Technol..

[39]  J. Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2019, ICLR.

[40]  Nitesh V. Chawla,et al.  Heterogeneous Graph Neural Network , 2019, KDD.

[41]  Matt J. Kusner,et al.  A Model to Search for Synthesizable Molecules , 2019, NeurIPS.

[42]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[43]  Svetha Venkatesh,et al.  Graph Transformation Policy Network for Chemical Reaction Prediction , 2018, KDD.

[44]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[45]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[46]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[47]  Regina Barzilay,et al.  Learning Multimodal Graph-to-Graph Translation for Molecular Optimization , 2018, ICLR.

[48]  Ping Zhang,et al.  Interpretable Drug Target Prediction Using Deep Neural Representation , 2018, IJCAI.

[49]  Derek T. Ahneman,et al.  Predicting reaction performance in C–N cross-coupling using machine learning , 2018, Science.

[50]  Fei Wang,et al.  Drug Similarity Integration Through Attentive Multi-view Graph Auto-Encoders , 2018, IJCAI.

[51]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[52]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[53]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[54]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[55]  Regina Barzilay,et al.  Prediction of Organic Reaction Outcomes Using Machine Learning , 2017, ACS central science.

[56]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[57]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[58]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[59]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[60]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[61]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[62]  Daniel M. Lowe Extraction of chemical structures and reactions from the literature , 2012 .

[63]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[64]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[65]  R. Lathe Phd by thesis , 1988, Nature.