S-SimCSE: Sampled Sub-networks for Contrastive Learning of Sentence Embedding

Contrastive learning has been studied for improving the performance of sentence embedding learning. The current state-of-the-art method is the SimCSE, which takes dropout as a data augmentation method and feeds a pre-trained Transformer encoder the same input sentence twice. Then, two sentence embeddings derived from different dropout masks can get to build a positive pair. A network being applied a dropout mask can be regarded as a sub-network of itself, whose expected scale is determined by the dropout rate. In this paper, we push most sub-networks with different expected scales can learn similar embedding for the same sentence. SimCSE failed to do so because they fixed the dropout rate to a tuned value, while we sampled dropout rates for each of the dropout functions. As this method will increase the difficulties of optimization, we also propose a simple sentence-wise masks strategy to sample more subnetworks. We evaluated the proposed S-SimCSE on several popular semantic text similarity datasets. Experimental results show that S-SimCSE outperforms the state-of-the-art SimCSE more than 1% on BERT-base.