Learning to Group Auxiliary Datasets for Molecule

The limited availability of annotations in small molecule datasets presents a challenge to machine learning models. To address this, one common strategy is to collaborate with additional auxiliary datasets. However, having more data does not always guarantee improvements. Negative transfer can occur when the knowledge in the target dataset differs or contradicts that of the auxiliary molecule datasets. In light of this, identifying the auxiliary molecule datasets that can benefit the target dataset when jointly trained remains a critical and unresolved problem. Through an empirical analysis, we observe that combining graph structure similarity and task similarity can serve as a more reliable indicator for identifying high-affinity auxiliary datasets. Motivated by this insight, we propose MolGroup, which separates the dataset affinity into task and structure affinity to predict the potential benefits of each auxiliary molecule dataset. MolGroup achieves this by utilizing a routing mechanism optimized through a bi-level optimization framework. Empowered by the meta gradient, the routing mechanism is optimized toward maximizing the target dataset's performance and quantifies the affinity as the gating score. As a result, MolGroup is capable of predicting the optimal combination of auxiliary datasets for each target dataset. Our extensive experiments demonstrate the efficiency and effectiveness of MolGroup, showing an average improvement of 4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets selected by MolGroup on 11 target molecule datasets.

[1]  Yuxiao Dong,et al.  BatchSampler: Sampling Mini-Batches for Contrastive Learning in Vision, Language, and Graphs , 2023, KDD.

[2]  Jian Liang,et al.  Mind the Label Shift of Augmentation-based Graph OOD Generalization , 2023, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Ruoxi Sun,et al.  Does GNN Pretraining Help Molecular Representation? , 2022, NeurIPS.

[4]  Hongxia Yang,et al.  GraphMAE: Self-Supervised Masked Graph Autoencoders , 2022, KDD.

[5]  Ed H. Chi,et al.  Improving Multi-Task Generalization via Regularizing Spurious Correlation , 2022, NeurIPS.

[6]  Di He,et al.  Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets , 2022, ArXiv.

[7]  Shengchao Liu,et al.  Structured Multi-task Learning for Molecular Property Prediction , 2022, AISTATS.

[8]  Ziwei Liu,et al.  Generalized Out-of-Distribution Detection: A Survey , 2021, International Journal of Computer Vision.

[9]  Shengchao Liu,et al.  Pre-training Molecular Graph Representation with 3D Geometry , 2021, ICLR.

[10]  Ankur Bapna,et al.  Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , 2021, EMNLP.

[11]  Christopher Fifty,et al.  Efficiently Identifying Task Groupings for Multi-Task Learning , 2021, Neural Information Processing Systems.

[12]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[13]  Ankur Bapna,et al.  Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation , 2021, ICLR.

[14]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[15]  Amir Barati Farimani,et al.  Molecular contrastive learning of representations via graph neural networks , 2021, Nature Machine Intelligence.

[16]  Thierry Langer,et al.  A compact review of molecular property prediction with graph neural networks. , 2020, Drug discovery today. Technologies.

[17]  Ryan Cotterell,et al.  If Beam Search Is the Answer, What Was the Question? , 2020, EMNLP.

[18]  Xiaomin Luo,et al.  Pushing the boundaries of molecular representation for drug discovery with graph attention mechanism. , 2020, Journal of medicinal chemistry.

[19]  Yatao Bian,et al.  Self-Supervised Graph Transformer on Large-Scale Molecular Data , 2020, NeurIPS.

[20]  Yuxiao Dong,et al.  GCC: Graph Contrastive Coding for Graph Neural Network Pre-Training , 2020, KDD.

[21]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[22]  Rico Sennrich,et al.  On Sparsifying Encoder Outputs in Sequence-to-Sequence Models , 2020, FINDINGS.

[23]  Quoc V. Le,et al.  Meta Pseudo Labels , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  J. Reymond,et al.  One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome , 2020, Journal of Cheminformatics.

[25]  Orhan Firat,et al.  Controlling Computation versus Quality for Neural Sequence Models , 2020, ArXiv.

[26]  Anton Gusev,et al.  Learning@home: Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts , 2020, ArXiv.

[27]  S. Levine,et al.  Gradient Surgery for Multi-Task Learning , 2020, NeurIPS.

[28]  Qingfu Zhang,et al.  Pareto Multi-Task Learning , 2019, NeurIPS.

[29]  Igor V. Tetko,et al.  Multitask Learning On Graph Neural Networks Applied To Molecular Property Predictions , 2019, ArXiv.

[30]  Edouard Grave,et al.  Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.

[31]  J. Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2019, ICLR.

[32]  Edouard Grave,et al.  Adaptive Attention Span in Transformers , 2019, ACL.

[33]  Jitendra Malik,et al.  Which Tasks Should Be Learned Together in Multi-task Learning? , 2019, ICML.

[34]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[35]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[37]  Vladlen Koltun,et al.  Multi-Task Learning as Multi-Objective Optimization , 2018, NeurIPS.

[38]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[39]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[41]  Zhao Chen,et al.  GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks , 2017, ICML.

[42]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[43]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[44]  José L. Medina-Franco,et al.  Database fingerprint (DFP): an approach to represent molecular databases , 2017, Journal of Cheminformatics.

[45]  Markus Freitag,et al.  Beam Search Strategies for Neural Machine Translation , 2017, NMT@ACL.

[46]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[47]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[48]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[49]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[50]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[51]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[52]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[53]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[54]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[55]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[56]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[57]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[58]  Guolin Ke,et al.  Uni-Mol: A Universal 3D Molecular Representation Learning Framework , 2023, ICLR.

[59]  James J. Q. Yu,et al.  Efficient and Effective Multi-task Grouping via Meta Learning on Task Combinations , 2022, NeurIPS.

[60]  Po-Nien Kung,et al.  Efficient Multi-Task Auxiliary Learning: Selecting Auxiliary Data by Feature Similarity , 2021, EMNLP.

[61]  Qiang Yang,et al.  An Overview of Multi-task Learning , 2018 .

[62]  Oriol Vinyals,et al.  Order Matters: Sequence to sequence for sets , 2016, ICLR 2016.

[63]  Adrià Cereto-Massagué,et al.  Molecular fingerprint similarity search in virtual screening. , 2015, Methods.

[64]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.