Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for various NLP tasks, by fine-tuning the models on task-specific training sets. During fine-tuning, the parameters are initialized from pre-trained models directly, which ignores how the learning process of similar NLP tasks in different domains is correlated and mutually reinforced. In this paper, we propose an effective learning procedure named Meta Fine-Tuning (MFT), served as a meta-learner to solve a group of similar NLP tasks for neural language models. Instead of simply multi-task training over all the datasets, MFT only learns from typical instances of various domains to acquire highly transferable knowledge. It further encourages the language model to encode domain-invariant representations by optimizing a series of novel domain corruption loss functions. After MFT, the model can be fine-tuned for each domain with better parameter initializations and higher generalization ability. We implement MFT upon BERT to solve several multi-domain text mining tasks. Experimental results confirm the effectiveness of MFT and its usefulness for few-shot learning.

[1]  Qiang Chen,et al.  Meta Relational Learning for Few-Shot Link Prediction in Knowledge Graphs , 2019, EMNLP-IJCNLP 2019.

[2]  Ming Zhou,et al.  Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing , 2019, ACL.

[3]  Qiang Yang,et al.  Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning , 2019, EMNLP.

[4]  Xuanjing Huang,et al.  Pre-trained Models for Natural Language Processing: A Survey , 2020, ArXiv.

[5]  Joachim Bingel,et al.  Identifying beneficial task relations for multi-task learning in deep neural networks , 2017, EACL.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Iain Murray,et al.  BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.

[8]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[9]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[10]  Xian Wu,et al.  Automated Relational Meta-learning , 2020, ICLR.

[11]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[12]  Timothy Baldwin,et al.  Semi-supervised Stochastic Multi-Domain Learning using Variational Inference , 2019, ACL.

[13]  Jianmin Wang,et al.  Multi-Adversarial Domain Adaptation , 2018, AAAI.

[14]  Yuki Arase,et al.  Transfer Fine-Tuning: A BERT Case Study , 2019, EMNLP/IJCNLP.

[15]  Hui Xiong,et al.  A Comprehensive Survey on Transfer Learning , 2021, Proceedings of the IEEE.

[16]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[17]  Xiaojun Wan,et al.  Multi-Domain Sentiment Classification Based on Domain-Aware Embedding and Attention , 2019, IJCAI.

[18]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[19]  Jian Shen,et al.  Wasserstein Distance Guided Representation Learning for Domain Adaptation , 2017, AAAI.

[20]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[21]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[22]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[23]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[24]  Luo Si,et al.  StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding , 2019, ICLR.

[25]  Pascale Fung,et al.  Personalizing Dialogue Agents via Meta-Learning , 2019, ACL.

[26]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[27]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[28]  Xu Chen,et al.  Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding , 2017, ACL.

[29]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[32]  Mikhail Burtsev,et al.  Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker , 2020, ArXiv.

[33]  Yu Cheng,et al.  Patient Knowledge Distillation for BERT Model Compression , 2019, EMNLP.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Jun Huang,et al.  KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification , 2020, AAAI.

[36]  Jun Zhao,et al.  Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism , 2018, EMNLP.

[37]  Zhong Su,et al.  Domain-Invariant Feature Distillation for Cross-Domain Sentiment Classification , 2019, EMNLP.

[38]  Xiaodong Liu,et al.  Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[39]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[40]  Zhuosheng Zhang,et al.  LIMIT-BERT : Linguistic Informed Multi-Task BERT , 2020, EMNLP.

[41]  Rick Siow Mong Goh,et al.  Dual Adversarial Neural Transfer for Low-Resource Named Entity Recognition , 2019, ACL.

[42]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[43]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[44]  Stefano Ermon,et al.  A DIRT-T Approach to Unsupervised Domain Adaptation , 2018, ICLR.

[45]  Siu Cheung Hui,et al.  Learning Term Embeddings for Taxonomic Relation Identification Using Dynamic Weighting Neural Network , 2016, EMNLP.

[46]  Yingming Li,et al.  Fine-tune BERT with Sparse Self-Attention Mechanism , 2019, EMNLP.

[47]  Zi-Yi Dou,et al.  Investigating Meta-Learning Algorithms for Low-Resource Natural Language Understanding Tasks , 2019, EMNLP.

[48]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[49]  Peng Hao,et al.  Transfer learning using computational intelligence: A survey , 2015, Knowl. Based Syst..