DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for AI-aided Drug Discovery - A Focus on Affinity Prediction Problems with Noise Annotations

AI-aided drug discovery (AIDD) is gaining increasing popularity due to its promise of making the search for new pharmaceuticals quicker, cheaper and more efficient. In spite of its extensive use in many fields, such as ADMET prediction, virtual screening, protein folding and generative chemistry, little has been explored in terms of the out-ofdistribution (OOD) learning problem with noise, which is inevitable in real world AIDD applications. In this work, we present DrugOOD1, a systematic OOD dataset curator and benchmark for AI-aided drug discovery, which comes with an open-source Python package that fully automates the data curation and OOD benchmarking processes. We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction, which involves both macromolecule (protein target) and small-molecule (drug compound). In contrast to only providing fixed datasets, DrugOOD offers automated dataset curator with user-friendly customization scripts, rich domain annotations aligned with biochemistry knowledge, realistic noise annotations and rigorous benchmarking of state-of-the-art OOD algorithms. Since the molecular data is often modeled as irregular graphs using graph neural network (GNN) backbones, DrugOOD also serves as a valuable testbed for graph OOD learning problems. Extensive empirical studies have shown a significant performance gap between in-distribution and out-of-distribution experiments, which highlights the need to develop better schemes that can allow for OOD generalization under noise for AIDD.

[1]  A. Zhavoronkov,et al.  Machine Learning on Human Muscle Transcriptomic Data for Biomarker Discovery and Tissue-Specific Drug Target Identification , 2018, Front. Genet..

[2]  Vasileios Stathias,et al.  Machine and Deep Learning Approaches for Cancer Drug Repurposing. , 2020, Seminars in cancer biology.

[3]  George Karypis,et al.  DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science , 2021, ACS omega.

[4]  P. Sanseau,et al.  Drug repurposing: progress, challenges and recommendations , 2018, Nature Reviews Drug Discovery.

[5]  Abhinav Vishnu,et al.  SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for Predicting Chemical Properties , 2017, ArXiv.

[6]  Bingbing Ni,et al.  Adversarial Domain Adaptation with Domain Mixup , 2019, AAAI.

[7]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[9]  Jure Leskovec,et al.  WILDS: A Benchmark of in-the-Wild Distribution Shifts , 2021, ICML.

[10]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[11]  Yutaka Saito,et al.  Convolutional neural network based on SMILES representation of compounds for detecting chemical motif , 2018, BMC Bioinformatics.

[12]  David Ryan Koes,et al.  GNINA 1.0: molecular docking with deep learning , 2021, Journal of Cheminformatics.

[13]  John B. O. Mitchell,et al.  A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking , 2010, Bioinform..

[14]  A. Bender,et al.  Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: Ways to make an impact, and why we are not there yet. , 2020, Drug discovery today.

[15]  A. Bender,et al.  Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. , 2006, IDrugs : the investigational drugs journal.

[16]  Percy Liang,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[17]  Jiawei Han,et al.  Chemical-Reaction-Aware Molecule Representation Learning , 2021, ICLR.

[18]  Junzhou Huang,et al.  SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction , 2019, BCB.

[19]  Ece Asilar,et al.  Image Based Liver Toxicity Prediction , 2020, J. Chem. Inf. Model..

[20]  Youssef Mroueh,et al.  Fair Mixup: Fairness via Interpolation , 2021, ICLR.

[21]  Sethuraman Panchanathan,et al.  Deep Hashing Network for Unsupervised Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[23]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[24]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[25]  Artem Cherkasov,et al.  QSAR without borders. , 2020, Chemical Society reviews.

[26]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[27]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[28]  Suvrit Sra,et al.  Coping with Label Shift via Distributionally Robust Optimisation , 2020, ICLR.

[29]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[30]  David Lopez-Paz,et al.  In Search of Lost Domain Generalization , 2020, ICLR.

[31]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[32]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[33]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[34]  Gyu Rie Lee,et al.  Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.

[35]  Keith C. C. Chan,et al.  Large-scale prediction of drug-target interactions from deep representations , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[36]  Tao Xiang,et al.  Deep Domain-Adversarial Image Generation for Domain Generalisation , 2020, AAAI.

[37]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[38]  Seunghyun Park,et al.  Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information , 2019, IEEE Access.

[39]  Jure Leskovec,et al.  Extending the WILDS Benchmark for Unsupervised Adaptation , 2021, ArXiv.

[40]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[41]  Arthur J. Olson,et al.  AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading , 2009, J. Comput. Chem..

[42]  Marc C. Nicklaus,et al.  QSAR Modeling and Prediction of Drug-Drug Interactions. , 2016, Molecular pharmaceutics.

[43]  Yuedong Yang,et al.  Communicative Representation Learning on Attributed Molecular Graphs , 2020, IJCAI.

[44]  Aditya Krishna Menon,et al.  Does label smoothing mitigate label noise? , 2020, ICML.

[45]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[46]  Masanori Koyama,et al.  Out-of-Distribution Generalization with Maximal Invariant Predictor , 2020, ArXiv.

[47]  Sun Kim,et al.  A review on compound-protein interaction prediction methods: Data, format, representation and model , 2021, Computational and structural biotechnology journal.

[48]  Brahim Chaib-draa,et al.  Domain generalization via optimal transport with metric similarity learning , 2021, Neurocomputing.

[49]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[50]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[51]  Masanori Koyama,et al.  When is invariance useful in an Out-of-Distribution Generalization problem ? , 2020, 2008.01883.

[52]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[53]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[54]  B. Roth,et al.  The Multiplicity of Serotonin Receptors: Uselessly Diverse Molecules or an Embarrassment of Riches? , 2000 .

[55]  Jimeng Sun,et al.  Therapeutics Data Commons: Machine Learning Datasets and Tasks for Therapeutics , 2021, ArXiv.

[56]  M. Linial,et al.  ProteinBERT: a universal deep-learning model of protein sequence and function , 2021, bioRxiv.

[57]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2019, Proceedings of the IEEE.

[58]  Jeffrey J. Gray,et al.  Deep Learning in Protein Structural Modeling and Design , 2020, Patterns.

[59]  Heather A. Carlson,et al.  A Call to Arms: What You Can Do for Computational Drug Discovery , 2011, J. Chem. Inf. Model..

[60]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[61]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[62]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[63]  David Ryan Koes,et al.  Lessons Learned in Empirical Scoring with smina from the CSAR 2011 Benchmarking Exercise , 2013, J. Chem. Inf. Model..

[64]  Brahim Chaib-draa,et al.  Domain Generalization with Optimal Transport and Metric Learning , 2020, ArXiv.

[65]  Alán Aspuru-Guzik,et al.  Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) , 2017 .

[66]  F. Sanger,et al.  The arrangement of amino acids in proteins. , 1952, Advances in protein chemistry.

[67]  Trevor Darrell,et al.  Auxiliary Image Regularization for Deep CNNs with Noisy Labels , 2015, ICLR.

[68]  Zhenguo Li,et al.  OoD-Bench: Benchmarking and Understanding Out-of-Distribution Generalization Datasets and Algorithms , 2021, ArXiv.

[69]  Peilin Jia,et al.  KinaseMD: kinase mutations and drug response database , 2020, Nucleic Acids Res..

[70]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[71]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[72]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[73]  Xi Peng,et al.  Learning to Learn Single Domain Generalization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Yang Yu,et al.  RetroXpert: Decompose Retrosynthesis Prediction like a Chemist , 2020, NeurIPS.

[75]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76]  Evan N. Feinberg,et al.  Improvement in ADMET Prediction with Multitask Deep Featurization. , 2020, Journal of medicinal chemistry.

[77]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[78]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[79]  Sungroh Yoon,et al.  Comprehensive ensemble in QSAR prediction for drug discovery , 2019, BMC Bioinformatics.

[80]  Lincan Zou,et al.  Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[81]  David S. Goodsell,et al.  RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences , 2020, Nucleic Acids Res..

[82]  Alberto L. Sangiovanni-Vincentelli,et al.  Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[83]  Wenbing Huang,et al.  Multi-View Graph Neural Networks for Molecular Property Prediction , 2020, 2005.13607.

[84]  Yufei Wang,et al.  Heterogeneous Domain Generalization Via Domain Mixup , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[85]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[86]  MarchandMario,et al.  Domain-adversarial training of neural networks , 2016 .

[87]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[88]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[89]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[90]  Di Wu,et al.  DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks , 2018, bioRxiv.

[91]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[92]  Le Song,et al.  Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search , 2020, ICML.

[93]  Ye Xu,et al.  Unbiased Metric Learning: On the Utilization of Multiple Datasets and Web Images for Softening Bias , 2013, 2013 IEEE International Conference on Computer Vision.

[94]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[95]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[96]  Karan Goel,et al.  Model Patching: Closing the Subgroup Performance Gap with Data Augmentation , 2020, ICLR.

[97]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[98]  Prashansa Agrawal,et al.  Artificial Intelligence in Drug Discovery and Development , 2018 .

[99]  G. Schneider,et al.  Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. , 2019, Chemical reviews.

[100]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[101]  Dejing Dou,et al.  Structure-aware Interactive Graph Neural Networks for the Prediction of Protein-Ligand Binding Affinity , 2021, KDD.

[102]  Alex ChiChung Kot,et al.  Domain Generalization with Adversarial Feature Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[103]  Qi Liu,et al.  Molecular Property Prediction: A Multilevel Quantum Interactions Modeling Perspective , 2019, AAAI.

[104]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[105]  Ivor W. Tsang,et al.  Masking: A New Perspective of Noisy Supervision , 2018, NeurIPS.

[106]  Peng Cui,et al.  Towards Non-I.I.D. image classification: A dataset and baselines , 2019, Pattern Recognit..

[107]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[108]  Eric J. Martin,et al.  Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds , 2017, J. Chem. Inf. Model..

[109]  Target Identification Among Known Drugs by Deep Learning from Heterogeneous Networks , 2019, SSRN Electronic Journal.

[110]  Dimitris Samaras,et al.  Artificial Intelligence in Drug Discovery: Applications and Techniques , 2021, Briefings Bioinform..

[111]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[112]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[113]  William H. Green,et al.  Computer-Assisted Retrosynthesis Based on Molecular Similarity , 2017, ACS central science.

[114]  Dimitris N. Metaxas,et al.  Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness , 2020, NeurIPS.

[115]  Max Welling,et al.  E(n) Equivariant Normalizing Flows for Molecule Generation in 3D , 2021, ArXiv.

[116]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[117]  Marwin H. S. Segler,et al.  FS-Mol: A Few-Shot Learning Dataset of Molecules , 2021, NeurIPS Datasets and Benchmarks.

[118]  Jie Li,et al.  PDB-wide collection of binding data: current status of the PDBbind database , 2015, Bioinform..

[119]  Xiaogang Wang,et al.  Deep Self-Learning From Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[120]  Edward W. Lowe,et al.  Computational Methods in Drug Discovery , 2014, Pharmacological Reviews.

[121]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[122]  Paul Michel,et al.  Examining and Combating Spurious Features under Distribution Shift , 2021, ICML.

[123]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[124]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[125]  Kate Saenko,et al.  Deep CORAL: Correlation Alignment for Deep Domain Adaptation , 2016, ECCV Workshops.

[126]  Jingxiao Bao,et al.  DeepBSP - a Machine Learning Method for Accurate Prediction of Protein-Ligand Docking Structures , 2021, J. Chem. Inf. Model..

[127]  Andreas Bender,et al.  How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements , 2016, ChemMedChem.

[128]  Yu Wang,et al.  Improving Out-of-Distribution Robustness via Selective Augmentation , 2022, ArXiv.

[129]  Nicola Nosengo Can you teach old drugs new tricks? , 2016, Nature.

[130]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[131]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[132]  Jike Wang,et al.  InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions. , 2021, Journal of medicinal chemistry.

[133]  Yongxin Yang,et al.  Deeper, Broader and Artier Domain Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[134]  A. Barabasi,et al.  Network medicine framework for identifying drug-repurposing opportunities for COVID-19 , 2020, Proceedings of the National Academy of Sciences.

[135]  WangWei,et al.  A Survey of Zero-Shot Learning , 2019 .

[136]  Yuemin Bian,et al.  Deep Learning for Drug Design: an Artificial Intelligence Paradigm for Drug Discovery in the Big Data Era , 2018, The AAPS Journal.

[137]  Sergey Plis,et al.  Deep Learning Applications for Predicting Pharmacological Properties of Drugs and Drug Repurposing Using Transcriptomic Data. , 2016, Molecular pharmaceutics.

[138]  Jeffrey Skolnick,et al.  Assessment of programs for ligand binding affinity prediction , 2008, J. Comput. Chem..

[139]  Jae-Gil Lee,et al.  Learning from Noisy Labels with Deep Neural Networks: A Survey , 2020, ArXiv.

[140]  Junzhou Huang,et al.  Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery , 2017, BCB.

[141]  A. Vulpetti,et al.  The experimental uncertainty of heterogeneous public K(i) data. , 2012, Journal of medicinal chemistry.

[142]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[143]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[144]  Jun Xu,et al.  Predicting Retrosynthetic Reactions Using Self-Corrected Transformer Neural Networks , 2019, J. Chem. Inf. Model..

[145]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[146]  Mengjie Zhang,et al.  Domain Generalization for Object Recognition with Multi-task Autoencoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[147]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[148]  Richard J. Povinelli,et al.  An ensemble model of QSAR tools for regulatory risk assessment , 2016, Journal of Cheminformatics.

[149]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[150]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[151]  Silvio Savarese,et al.  Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.

[152]  Robert C. Williamson,et al.  A Theory of Learning with Corrupted Labels , 2017, J. Mach. Learn. Res..

[153]  Seongok Ryu,et al.  Predicting Drug-Target Interaction Using a Novel Graph Neural Network with 3D Structure-Embedded Graph Representation , 2019, J. Chem. Inf. Model..

[154]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[155]  Stanley E. Lazic,et al.  Quantifying sources of uncertainty in drug discovery predictions with probabilistic models , 2021, ArXiv.

[156]  David M. Blei,et al.  Robust Probabilistic Modeling with Bayesian Data Reweighting , 2016, ICML.

[157]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[158]  Aditya Krishna Menon,et al.  Learning with Symmetric Label Noise: The Importance of Being Unhinged , 2015, NIPS.

[159]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[160]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[161]  Zhangjie Cao,et al.  Open Domain Generalization with Domain-Augmented Meta-Learning , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[162]  Juan M. Luco,et al.  QSAR Based on Multiple Linear Regression and PLS Methods for the Anti-HIV Activity of a Large Group of HEPT Derivatives , 1997, J. Chem. Inf. Comput. Sci..

[163]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[164]  Lei Xie,et al.  A deep learning framework for high-throughput mechanism-driven phenotype compound screening and its application to COVID-19 drug repurposing , 2021, Nature Machine Intelligence.

[165]  Pierre Baldi,et al.  Influence Relevance Voting: An Accurate And Interpretable Virtual High Throughput Screening Method , 2009, J. Chem. Inf. Model..

[166]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[167]  Percy Liang,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[168]  Feixiong Cheng,et al.  In silico Prediction of Drug Induced Liver Toxicity Using Substructure Pattern Recognition Method , 2016, Molecular informatics.

[169]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.