AANG: Automating Auxiliary Learning

Auxiliary objectives, supplementary learning signals that are introduced to help aid learning on data-starved or highly complex end-tasks, are commonplace in machine learning. Whilst much work has been done to formulate useful auxiliary objectives, their construction is still an art which proceeds by slow and tedious hand-design. Intuition for how and when these objectives improve end-task performance has also had limited theoretical backing. In this work, we present an approach for automatically generating a suite of auxiliary objectives. We achieve this by deconstructing existing objectives within a novel unified taxonomy, identifying connections between them, and generating new ones based on the uncovered structure. Next, we theoretically formalize widely-held intuitions about how auxiliary learning improves generalization on the end-task. This leads us to a principled and efficient algorithm for searching the space of generated objectives to find those most useful to a specified end-task. With natural language processing (NLP) as our domain of study, we demonstrate that our automated auxiliary learning pipeline leads to strong improvements over competitive baselines across continued training experiments on a pre-trained model on 5 NLP tasks.

[1]  Y. Zhuang,et al.  Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain , 2022, Remote. Sens..

[2]  Sanket Vaibhav Mehta,et al.  ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning , 2021, ArXiv.

[3]  Zhilin Yang,et al.  NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework , 2021, ICML.

[4]  Sang Michael Xie,et al.  An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[5]  Paul Michel,et al.  Should We Be Pre-training? An Argument for End-task Aware Training as an Alternative , 2021, ICLR.

[6]  Yann Dauphin,et al.  Auxiliary Task Update Decomposition: the Good, the Bad and the neutral , 2021, ICLR.

[7]  Tri Dao,et al.  Rethinking Neural Operations for Diverse Tasks , 2021, NeurIPS.

[8]  Sonal Gupta,et al.  Muppet: Massive Multi-task Representations with Pre-Finetuning , 2021, EMNLP.

[9]  Sanjeev Arora,et al.  A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks , 2020, ICLR.

[10]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[11]  Ethan Fetaya,et al.  Auxiliary Learning by Implicit Differentiation , 2020, ICLR.

[12]  Maria-Florina Balcan,et al.  Geometry-Aware Gradient Algorithms for Neural Architecture Search , 2020, ICLR.

[13]  Rong Jin,et al.  Improved Fine-tuning by Leveraging Pre-training Data: Theory and Practice , 2021, ArXiv.

[14]  Trevor Darrell,et al.  Auxiliary Task Reweighting for Minimum-data Learning , 2020, NeurIPS.

[15]  Dhruv Batra,et al.  Auxiliary Tasks Speed Up Learning PointGoal Navigation , 2020, CoRL.

[16]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[17]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[18]  Atri Rudra,et al.  Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps , 2020, ICLR.

[19]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[20]  Quoc V. Le,et al.  ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[21]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[22]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[23]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[24]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[25]  Daniel S. Weld,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[26]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[27]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[28]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[29]  Yi Yang,et al.  Searching for a Robust Neural Architecture in Four GPU Hours , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[31]  Benno Stein,et al.  SemEval-2019 Task 4: Hyperpartisan News Detection , 2019, *SEMEVAL.

[32]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[33]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Ronan Collobert,et al.  wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.

[35]  Iz Beltagy,et al.  SciBERT: Pretrained Contextualized Embeddings for Scientific Text , 2019, ArXiv.

[36]  Andrew J. Davison,et al.  Self-Supervised Generalisation with Meta Auxiliary Learning , 2019, NeurIPS.

[37]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[38]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[39]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  David Held,et al.  Adaptive Auxiliary Task Weighting for Reinforcement Learning , 2019, NeurIPS.

[42]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[43]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[44]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[45]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[46]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[47]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[48]  Christoph H. Lampert,et al.  Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.

[49]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[50]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[51]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[52]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[53]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[54]  Saif Mohammad,et al.  SemEval-2016 Task 6: Detecting Stance in Tweets , 2016, *SEMEVAL.

[55]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[56]  Tudor I. Oprea,et al.  ChemProt-3.0: a global chemical biology diseases mapping , 2016, Database J. Biol. Databases Curation.

[57]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[58]  Xavier Carreras,et al.  A Simple Named Entity Extractor using AdaBoost , 2003, CoNLL.

[59]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[60]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[61]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[62]  Eugene Charniak,et al.  Statistical Techniques for Natural Language Parsing , 1997, AI Mag..