论文信息 - Transfer Fine-Tuning: A BERT Case Study

Transfer Fine-Tuning: A BERT Case Study

A semantic equivalence assessment is defined as a task that assesses semantic equivalence in a sentence pair by binary judgment (i.e., paraphrase identification) or grading (i.e., semantic textual similarity measurement). It constitutes a set of tasks crucial for research on natural language understanding. Recently, BERT realized a breakthrough in sentence representation learning (Devlin et al., 2019), which is broadly transferable to various NLP tasks. While BERT's performance improves by increasing its model size, the required computational power is an obstacle preventing practical applications from adopting the technology. Herein, we propose to inject phrasal paraphrase relations into BERT in order to generate suitable representations for semantic equivalence assessment instead of increasing the model size. Experiments on standard natural language understanding tasks confirm that our method effectively improves a smaller BERT model while maintaining the model size. The generated model exhibits superior performance compared to a larger BERT model on semantic equivalence assessment tasks. Furthermore, it achieves larger performance gains on tasks with limited training datasets for fine-tuning, which is a property desirable for transfer learning.

Yuki Arase | Junichi Tsujii

[1] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[2] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[3] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4] David Kauchak,et al. Improving Text Simplification Language Modeling Using Unsimplified Text Data , 2013, ACL.

[5] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[6] Luke S. Zettlemoyer,et al. Syntactic Scaffolds for Semantic Structures , 2018, EMNLP.

[7] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[8] Samuel R. Bowman,et al. Neural Network Acceptability Judgments , 2018, Transactions of the Association for Computational Linguistics.

[9] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[11] Yulia Tsvetkov,et al. Style Transfer Through Back-Translation , 2018, ACL.