LAMAL: LAnguage Modeling Is All You Need for Lifelong Language Learning

Most research on lifelong learning (LLL) applies to images or games, but not language. Here, we introduce LAMAL, a simple yet effective method for LLL based on language modeling. LAMAL replays pseudo samples of previous tasks while requiring no extra memory or model capacity. To be specific, LAMAL is a language model learning to solve the task and generate training samples at the same time. At the beginning of training a new task, the model generates some pseudo samples of previous tasks to train alongside the data of the new task. The results show that LAMAL prevents catastrophic forgetting without any sign of intransigence and can solve up to five very different language tasks sequentially with only one model. Overall, LAMAL outperforms previous methods by a considerable margin and is only 2-3\% worse than multitasking which is usually considered as the upper bound of LLL. Our source code is available at this https URL.

[1]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[2]  Luke S. Zettlemoyer,et al.  Deep Semantic Role Labeling: What Works and What’s Next , 2017, ACL.

[3]  Yoshua Bengio,et al.  Toward Training Recurrent Neural Networks for Lifelong Learning , 2018, Neural Computation.

[4]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[6]  Bing Liu,et al.  Lifelong Learning for Sentiment Classification , 2015, ACL.

[7]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[8]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[9]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[10]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[11]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[12]  Marcus Rohrbach,et al.  Exploring the Challenges towards Lifelong Fact Learning , 2018, ACCV.

[13]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[14]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[15]  Tianlin Liu,et al.  Continual Learning for Sentence Representations Using Conceptors , 2019, NAACL.

[16]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[17]  Rui Xia,et al.  Distantly Supervised Lifelong Learning for Large-Scale Social Media Sentiment Analysis , 2017, IEEE Transactions on Affective Computing.

[18]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[19]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[21]  Sungjin Lee,et al.  Toward Continual Learning for Conversational Agents , 2017, ArXiv.

[22]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[23]  Tianqi Chen,et al.  Net2Net: Accelerating Learning via Knowledge Transfer , 2015, ICLR.

[24]  Philip S. Yu,et al.  Lifelong Domain Word Embedding via Meta-Learning , 2018, IJCAI.

[25]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[26]  Richard Socher,et al.  Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning , 2018, ArXiv.

[27]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[28]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[29]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[30]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[31]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[32]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.