PromptBERT: Improving BERT Sentence Embeddings with Prompts

The poor performance of the original BERT1 for sentence semantic similarity has been widely discussed in previous works. We find that unsatisfactory performance is mainly due to the static token embeddings biases and the ineffective BERT layers, rather than the high cosine similarity of the sentence embeddings. To this end, we propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective. By reformulating the sentence embeddings task as the fillin-the-blanks problem, our method significantly improves the performance of original BERT. We discuss two prompt representing methods and three prompt searching methods for prompt based sentence embeddings. Moreover, we propose a novel unsupervised training objective by the technology of template denoising, which substantially shortens the performance gap between the supervised and unsupervised setting. For experiments, we evaluate our method on both non fine-tuned and finetuned settings. Even a non fine-tuned method can outperform the fine-tuned methods like unsupervised ConSERT on STS tasks. Our finetuned method outperforms the state-of-the-art method SimCSE in both unsupervised and supervised settings. Compared to SimCSE, we achieve 2.29 and 2.58 points improvements on BERT and RoBERTa respectively under the unsupervised setting. Our code is available at https://github.com/kongds/ Prompt-BERT.

[1]  Jiarun Cao,et al.  Whitening Sentence Representations for Better Semantics and Faster Retrieval , 2021, ArXiv.

[2]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[3]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[4]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[7]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[8]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[9]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[10]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[11]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[12]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[13]  Fuzheng Zhang,et al.  ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer , 2021, ACL.

[14]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[15]  Tianyu Gao,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[16]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[17]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[18]  Zexuan Zhong,et al.  Factual Probing Is [MASK]: Learning vs. Learning to Recall , 2021, NAACL.

[19]  Kwan Hui Lim,et al.  An Unsupervised Sentence Embedding Method by Mutual Information Maximization , 2020, EMNLP.

[20]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.

[21]  Ryohei Sasano,et al.  DefSent: Sentence Embeddings using Definition Sentences , 2021, ACL.

[22]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[23]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[24]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[25]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[26]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.