Evaluation of BERT and ALBERT Sentence Embedding Performance on Downstream NLP Tasks

Contextualized representations from a pre-trained language model are central to achieve a high performance on downstream NLP task. The pre-trained BERT and A Lite BERT (ALBERT) models can be fine-tuned to give state-ofthe-art results in sentence-pair regressions such as semantic textual similarity (STS) and natural language inference (NLI). Although BERT-based models yield the [CLS] token vector as a reasonable sentence embedding, the search for an optimal sentence embedding scheme remains an active research area in computational linguistics. This paper explores on sentence embedding models for BERT and ALBERT. In particular, we take a modified BERT network with siamese and triplet network structures called Sentence-BERT (SBERT) and replace BERT with ALBERT to create Sentence-ALBERT (SALBERT). We also experiment with an outer CNN sentence-embedding network for SBERT and SALBERT. We evaluate performances of all sentence-embedding models considered using the STS and NLI datasets. The empirical results indicate that our CNN architecture improves ALBERT models substantially more than BERT models for STS benchmark. Despite significantly fewer model parameters, ALBERT sentence embedding is highly competitive to BERT in downstream NLP evaluations.

[1]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[2]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[3]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[4]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[5]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[6]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Yoshua Bengio,et al.  Fine-grained attention mechanism for neural machine translation , 2018, Neurocomputing.

[9]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[10]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[11]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[12]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[13]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[16]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[17]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[18]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[19]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[20]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[21]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[22]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[23]  Ray Kurzweil,et al.  Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.

[24]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.