Structural-Aware Sentence Similarity with Recursive Optimal Transport

Measuring sentence similarity is a classic topic in natural language processing. Light-weighted similarities are still of particular practical significance even when deep learning models have succeeded in many other tasks. Some light-weighted similarities with more theoretical insights have been demonstrated to be even stronger than supervised deep learning approaches. However, the successful light-weighted models such as Word Mover's Distance [Kusner et al., 2015] or Smooth Inverse Frequency [Arora et al., 2017] failed to detect the difference from the structure of sentences, i.e. order of words. To address this issue, we present Recursive Optimal Transport (ROT) framework to incorporate the structural information with the classic OT. Moreover, we further develop Recursive Optimal Similarity (ROTS) for sentences with the valuable semantic insights from the connections between cosine similarity of weighted average of word vectors and optimal transport. ROTS is structural-aware and with low time complexity compared to optimal transport. Our experiments over 20 sentence textural similarity (STS) datasets show the clear advantage of ROTS over all weakly supervised approaches. Detailed ablation study demonstrate the effectiveness of ROT and the semantic insights.

[1]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[4]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[5]  Kawin Ethayarajh,et al.  Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline , 2018, Rep4NLP@ACL.

[6]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[9]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[10]  Vitalii Zhelezniak,et al.  Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors , 2019, ICLR.

[11]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[12]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[13]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[15]  H. J. Mclaughlin,et al.  Learn , 2002 .

[16]  Chenguang Zhu,et al.  Parameter-free Sentence Embedding via Orthogonal Basis , 2019, EMNLP/IJCNLP.

[17]  Ray Kurzweil,et al.  Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.

[18]  Pradeep Ravikumar,et al.  Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[19]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[22]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[23]  Martin Jaggi,et al.  Context Mover's Distance & Barycenters: Optimal transport of contexts for building representations , 2018, DGS@ICLR.

[24]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[25]  Kevin Gimpel,et al.  Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings , 2017, ACL.

[26]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.