SimCSE: Simple Contrastive Learning of Sentence Embeddings

This paper presents SimCSE, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We hypothesize that dropout acts as minimal data augmentation and removing it leads to a representation collapse. Then, we draw inspiration from the recent success of learning sentence embeddings from natural language inference (NLI) datasets and incorporate annotated pairs from NLI datasets into contrastive learning by using “entailment” pairs as positives and “contradiction” pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERTbase achieve an average of 74.5% and 81.6% Spearman’s correlation respectively, a 7.9 and 4.6 points improvement compared to previous best results. We also show that contrastive learning theoretically regularizes pretrained embeddings’ anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.1

[1]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[2]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[3]  Mohit Bansal,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[4]  Graham Neubig,et al.  A Bilingual Generative Transformer for Semantic Sentence Embedding , 2019, EMNLP.

[5]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[6]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[7]  Sang-goo Lee,et al.  Self-Guided Contrastive Learning for BERT Sentence Representations , 2021, ACL.

[8]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[9]  Jiarun Cao,et al.  Whitening Sentence Representations for Better Semantics and Faster Retrieval , 2021, ArXiv.

[10]  Jason Baldridge,et al.  Learning Dense Representations for Entity Retrieval , 2019, CoNLL.

[11]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[12]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[13]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Yizhou Sun,et al.  On Sampling Strategies for Neural Network-based Collaborative Filtering , 2017, KDD.

[16]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Yiming Yang,et al.  On the Sentence Embeddings from BERT for Semantic Textual Similarity , 2020, EMNLP.

[20]  Nan Duan,et al.  WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach , 2021, EMNLP.

[21]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[22]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[23]  Di He,et al.  Representation Degeneration Problem in Training Natural Language Generation Models , 2019, ICLR.

[24]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[25]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[26]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[27]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Fuzheng Zhang,et al.  ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer , 2021, ACL.

[29]  Madian Khabsa,et al.  CLEAR: Contrastive Learning for Sentence Representation , 2020, ArXiv.

[30]  Paul N. Bennett,et al.  COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining , 2021, NeurIPS.

[31]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[32]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[33]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[34]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[35]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[36]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[37]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[38]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Convolutional Neural Networks , 2014, NIPS.

[39]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[40]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[41]  Phillip Isola,et al.  Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere , 2020, ICML.

[42]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[43]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[44]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[45]  Danqi Chen,et al.  Learning Dense Representations of Phrases at Scale , 2020, ACL.

[46]  Amaru Cuba Gyllensten,et al.  Semantic Re-tuning with Contrastive Tension , 2021, ICLR.

[47]  Gary D. Bader,et al.  DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations , 2020, ACL.

[48]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[49]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[50]  Peter Young,et al.  From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[51]  Kawin Ethayarajh,et al.  How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.

[52]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[53]  Matthew Henderson,et al.  Efficient Natural Language Response Suggestion for Smart Reply , 2017, ArXiv.

[54]  Iryna Gurevych,et al.  Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity , 2016, COLING.

[55]  Jing Huang,et al.  Improving Neural Language Generation with Spectrum Control , 2020, ICLR.

[56]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[57]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[58]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[59]  Jorma K. Merikoski,et al.  On the trace and the sum of elements of a matrix , 1984 .

[60]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[61]  Kwan Hui Lim,et al.  An Unsupervised Sentence Embedding Method by Mutual Information Maximization , 2020, EMNLP.