Word Rotator’s Distance

A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover's distance (i.e., optimal transport cost), which we refer to as word rotator's distance. Besides, we find how to grow the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. The source code is available at this https URL

[1]  Ramesh Nallapati,et al.  Universal Text Representation from BERT: An Empirical Study , 2019, ArXiv.

[2]  Kevin Gimpel,et al.  From Paraphrase Database to Compositional Paraphrase Model and Back , 2015, Transactions of the Association for Computational Linguistics.

[3]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[4]  Noah A. Smith,et al.  Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts , 2019, ACL.

[5]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[6]  Tianlin Liu,et al.  Continual Learning for Sentence Representations Using Conceptors , 2019, NAACL.

[7]  Graham Neubig,et al.  Beyond BLEU:Training Neural Machine Translation with Semantic Similarity , 2019, ACL.

[8]  Mikhail Khodak,et al.  A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors , 2018, ACL.

[9]  Nicolas Courty,et al.  Optimal Transport for structured data with application on graphs , 2018, ICML.

[10]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[11]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[12]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Kawin Ethayarajh,et al.  Unsupervised Random Walk Sentence Embeddings: A Strong but Simple Baseline , 2018, Rep4NLP@ACL.

[15]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[17]  Kevin Gimpel,et al.  Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[18]  Chris Callison-Burch,et al.  SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter (PIT) , 2015, *SEMEVAL.

[19]  Timothy Dozat,et al.  Universal Dependency Parsing from Scratch , 2019, CoNLL.

[20]  Vitalii Zhelezniak,et al.  Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors , 2019, ICLR.

[21]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Benjamin J. Wilson,et al.  Measuring Word Significance using Distributed Representations of Words , 2015, ArXiv.

[24]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[25]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[26]  Alexander Panchenko,et al.  How much does a word weigh? Weighting word embeddings for word sense induction , 2018, ArXiv.

[27]  Claire Cardie,et al.  SemEval-2014 Task 10: Multilingual Semantic Textual Similarity , 2014, *SEMEVAL.

[28]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment and Semantic Vector Composition , 2015, *SEMEVAL.

[29]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[30]  Steven Bethard,et al.  DLS@CU: Sentence Similarity from Word Alignment , 2014, *SEMEVAL.

[31]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[32]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[33]  Fei Liu,et al.  MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance , 2019, EMNLP.

[34]  Filippo Santambrogio,et al.  Optimal Transport for Applied Mathematicians , 2015 .

[35]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[36]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[37]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[38]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[39]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[40]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[41]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[42]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[43]  F. Santambrogio Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling , 2015 .

[44]  Tianlin Liu,et al.  Unsupervised Post-processing of Word Vectors via Conceptor Negation , 2018, AAAI.

[45]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[46]  Gustavo K. Rohde,et al.  A Linear Optimal Transportation Framework for Quantifying and Visualizing Variations in Sets of Images , 2012, International Journal of Computer Vision.

[47]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[48]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[49]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[50]  Sanjeev Arora,et al.  A Latent Variable Model Approach to PMI-based Word Embeddings , 2015, TACL.

[51]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[52]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[53]  Claire Cardie,et al.  SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability , 2015, *SEMEVAL.

[54]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[55]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[56]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[57]  Tommi S. Jaakkola,et al.  Structured Optimal Transport , 2018, AISTATS.

[58]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[59]  Pramod Viswanath,et al.  All-but-the-Top: Simple and Effective Postprocessing for Word Representations , 2017, ICLR.

[60]  J. Loehlin Latent variable models , 1987 .

[61]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[62]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.