DILBERT: Customized Pre-Training for Domain Adaptation with Category Shift, with an Application to Aspect Extraction

The rise of pre-trained language models has yielded substantial progress in the vast majority of Natural Language Processing (NLP) tasks. However, a generic approach towards the pre-training procedure can naturally be sub-optimal in some cases. Particularly, finetuning a pre-trained language model on a source domain and then applying it to a different target domain, results in a sharp performance decline of the eventual classifier for many source-target domain pairs. Moreover, in some NLP tasks, the output categories substantially differ between domains, making adaptation even more challenging. This, for example, happens in the task of aspect extraction, where the aspects of interest of reviews of, e.g., restaurants or electronic devices may be very different. This paper presents a new fine-tuning scheme for BERT, which aims to address the above challenges. We name this scheme DILBERT: Domain Invariant Learning with BERT, and customize it for aspect extraction in the unsupervised domain adaptation setting. DILBERT harnesses the categorical information of both the source and the target domains to guide the pre-training process towards a more domain and category invariant representation, thus closing the gap between the domains. We show that DILBERT yields substantial improvements over state-ofthe-art baselines while using a fraction of the unlabeled data, particularly in more challenging domain adaptation setups.1

[1]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[2]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[3]  Hao Yu,et al.  Structure-Aware Review Mining and Summarization , 2010, COLING.

[4]  Roi Reichart,et al.  Neural Structural Correspondence Learning for Domain Adaptation , 2016, CoNLL.

[5]  Haris Papageorgiou,et al.  SemEval-2016 Task 5: Aspect Based Sentiment Analysis , 2016, *SEMEVAL.

[6]  Xinyan Xiao,et al.  SKEP: Sentiment Knowledge Enhanced Pre-training for Sentiment Analysis , 2020, ACL.

[7]  Xiaokui Xiao,et al.  Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis , 2016, EMNLP.

[8]  Roi Reichart,et al.  Pivot Based Language Modeling for Improved Neural Domain Adaptation , 2018, NAACL.

[9]  Roi Reichart,et al.  PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models , 2020, Transactions of the Association for Computational Linguistics.

[10]  Timothy A Miller Simplified Neural Unsupervised Domain Adaptation , 2019, NAACL-HLT.

[11]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[12]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Philip S. Yu,et al.  BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis , 2019, NAACL.

[15]  Qiang Yang,et al.  Transferable End-to-End Aspect-based Sentiment Analysis with Selective Adversarial Learning , 2019, EMNLP.

[16]  Yixin Chen,et al.  Automatic Feature Decomposition for Single View Co-training , 2011, ICML.

[17]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[18]  Roi Reichart,et al.  Task Refinement Learning for Improved Accuracy and Stability of Unsupervised Domain Adaptation , 2019, ACL.

[19]  Suresh Manandhar,et al.  SemEval-2015 Task 12: Aspect Based Sentiment Analysis , 2015, *SEMEVAL.

[20]  Roi Reichart,et al.  Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance , 2018, EMNLP.

[21]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[22]  Shafiq R. Joty,et al.  Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings , 2015, EMNLP.

[23]  Sinno Jialin Pan,et al.  Recursive Neural Structural Correspondence Network for Cross-domain Aspect and Opinion Co-Extraction , 2018, ACL.

[24]  Moshe Wasserblat,et al.  Syntactically Aware Cross-Domain Aspect and Opinion Terms Extraction , 2020, COLING.

[25]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[26]  Christopher S. G. Khoo,et al.  Aspect-based sentiment analysis of movie reviews on discussion boards , 2010, J. Inf. Sci..

[27]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[28]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[29]  Iryna Gurevych,et al.  Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[30]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[31]  Chun Chen,et al.  Opinion Word Expansion and Target Extraction through Double Propagation , 2011, CL.

[32]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[33]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.