Quantitative evaluation of explainable graph neural networks for molecular property prediction

Advances in machine learning have led to graph neural network-based methods for drug discovery, yielding promising results in molecular design, chemical synthesis planning, and molecular property prediction. However, current graph neural networks (GNNs) remain of limited acceptance in drug discovery is limited due to their lack of interpretability. Although this major weakness has been mitigated by the development of explainable artificial intelligence (XAI) techniques, the “ground truth” assignment in most explainable tasks ultimately rests with subjective judgments by humans so that the quality of model interpretation is hard to evaluate in quantity. In this work, we first build three levels of benchmark datasets to quantitatively assess the interpretability of the state-of-the-art GNN models. Then we implemented recent XAI methods in combination with different GNN algorithms to highlight the benefits, limitations, and future opportunities for drug discovery. As a result, GradInput and IG generally provide the best model interpretability for GNNs, especially when combined with GraphNet and CMPNN. The integrated and developed XAI package is fully open-sourced and can be used by practitioners to train new models on other drug discovery tasks.

[1]  Lucy J. Colwell,et al.  Evaluating Attribution for Graph Neural Networks , 2020, NeurIPS.

[2]  Igor V. Tetko,et al.  ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions , 2012, J. Chem. Inf. Model..

[3]  Klaus-Robert Müller,et al.  Benchmark Data Set for in Silico Prediction of Ames Mutagenicity , 2009, J. Chem. Inf. Model..

[4]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[7]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[8]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[10]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[11]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[12]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[13]  Keiji Ogura,et al.  Construction of an integrated database for hERG blocking small molecules , 2018, PloS one.

[14]  Yuedong Yang,et al.  Communicative Representation Learning on Attributed Molecular Graphs , 2020, IJCAI.

[15]  Yang Yu,et al.  RetroXpert: Decompose Retrosynthesis Prediction like a Chemist , 2020, NeurIPS.

[16]  Ruifeng Liu,et al.  Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries , 2015, Journal of Cheminformatics.

[17]  Regina Barzilay,et al.  Multi-Objective Molecule Generation using Interpretable Substructures , 2020, ICML.

[18]  Qi Liu,et al.  Constrained Graph Variational Autoencoders for Molecule Design , 2018, NeurIPS.

[19]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[20]  Yuedong Yang,et al.  MolRep: A Deep Representation Learning Library for Molecular Property Prediction , 2021, bioRxiv.

[21]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[22]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[24]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[25]  Yu Rong,et al.  Graph Information Bottleneck for Subgraph Recognition , 2020, ICLR.

[26]  Gisbert Schneider,et al.  Coloring Molecules with Explainable Artificial Intelligence for Preclinical Relevance Assessment , 2021, J. Chem. Inf. Model..

[27]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[28]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[29]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[30]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[31]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[32]  Gisbert Schneider,et al.  Drug discovery with explainable artificial intelligence , 2020, Nature Machine Intelligence.

[33]  John J. Irwin,et al.  ZINC 15 – Ligand Discovery for Everyone , 2015, J. Chem. Inf. Model..

[34]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[35]  Ruili Huang,et al.  Comprehensive Characterization of Cytochrome P450 Isozyme Selectivity across Chemical Libraries , 2009, Nature Biotechnology.

[36]  Ankur Taly,et al.  Using attribution to decode binding mechanism in neural network models for chemistry , 2018, Proceedings of the National Academy of Sciences.