SimpleTran: Transferring Pre-Trained Sentence Embeddings for Low Resource Text Classification

Fine-tuning pre-trained sentence embedding models like BERT has become the default transfer learning approach for several NLP tasks like text classification. We propose an alternative transfer learning approach called SimpleTran which is simple and effective for low resource text classification characterized by small sized datasets. We train a simple sentence embedding model on the target dataset, combine its output embedding with that of the pre-trained model via concatenation or dimension reduction, and finally train a classifier on the combined embedding either by fixing the embedding model weights or training the classifier and the embedding models end-to-end. Keeping embeddings fixed, SimpleTran significantly improves over fine-tuning on small datasets, with better computational efficiency. With end-to-end training, SimpleTran outperforms fine-tuning on small and medium sized datasets with negligible computational overhead. We provide theoretical analysis for our method, identifying conditions under which it has advantages.

[1]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[2]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[3]  Alessandro Moschitti,et al.  TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection , 2019, AAAI.

[4]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[5]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[8]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[9]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[10]  Young-Bum Kim,et al.  Domain Attention with an Ensemble of Experts , 2017, ACL.

[11]  Ran Wang,et al.  To Tune or Not To Tune? How About the Best of Both Worlds? , 2019, ArXiv.

[12]  Hwee Tou Ng,et al.  Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification , 2018, EMNLP.

[13]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[14]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[16]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[17]  Yu Zhang,et al.  End-to-End Adversarial Memory Network for Cross-domain Sentiment Classification , 2017, IJCAI.

[18]  Young-Bum Kim,et al.  Frustratingly Easy Neural Domain Adaptation , 2016, COLING.

[19]  Ali Farhadi,et al.  Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.

[20]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[21]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[22]  William A. Sethares,et al.  Domain Adapted Word Embeddings for Improved Sentiment Classification , 2018, ACL.

[23]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[24]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[25]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[26]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[27]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[28]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[29]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[30]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Yuki Arase,et al.  Transfer Fine-Tuning: A BERT Case Study , 2019, EMNLP/IJCNLP.

[33]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[34]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[35]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[36]  Jianfei Yu,et al.  Learning Sentence Embeddings with Auxiliary Tasks for Cross-Domain Sentiment Classification , 2016, EMNLP.