Robust Document Distance with Wasserstein-Fisher-Rao metric

Computing the distance among linguistic objects is an essential problem in natural language processing. The word mover’s distance (WMD) has been successfully applied to measure the document distance by synthesizing the low-level word similarity with the framework of optimal transport (OT). However, due to the global transportation nature of OT, the WMD may overestimate the semantic dissimilarity when documents contain unequal semantic details. In this paper, we propose to address this overestimation issue with a novel Wasserstein-Fisher-Rao (WFR) document distance grounded on unbalanced optimal transport theory. Compared to the WMD, the WFR document distance provides a tradeoff between global transportation and local truncation, which leads to a better similarity measure for unequal semantic details. Moreover, an efficient prune strategy is particularly designed for the WFR document distance to facilitate the top-k queries among a large number of documents. Extensive experimental results show that the WFR document distance achieves higher accuracy that WMD and even its supervised variation s-WMD.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Gabriel Peyré,et al.  Iterative Bregman Projections for Regularized Transportation Problems , 2014, SIAM J. Sci. Comput..

[4]  Ion Androutsopoulos,et al.  Using Centroids of Word Embeddings and Word Mover’s Distance for Biomedical Document Retrieval in Question Answering , 2016, BioNLP@ACL.

[5]  Pradeep Ravikumar,et al.  Word Mover’s Embedding: From Word2Vec to Document Embedding , 2018, EMNLP.

[6]  François-Xavier Vialard,et al.  An Interpolating Distance Between Optimal Transport and Fisher–Rao Metrics , 2010, Foundations of Computational Mathematics.

[7]  Alexander Mielke,et al.  Optimal Transport in Competition with Reaction: The Hellinger-Kantorovich Distance and Geodesic Curves , 2015, SIAM J. Math. Anal..

[8]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[9]  N. Trudinger,et al.  Regularity of Potential Functions of the Optimal Transportation Problem , 2005 .

[10]  Kathleen McKeown,et al.  Extractive and Abstractive Event Summarization over Streaming Web Text , 2016, IJCAI.

[11]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[12]  Jinjun Xiong,et al.  Document Similarity for Texts of Varying Lengths via Hidden Topics , 2018, ACL.

[13]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[14]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[15]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[16]  Jean-Luc Starck,et al.  Wasserstein Dictionary Learning: Optimal Transport-based unsupervised non-linear dictionary learning , 2017, SIAM J. Imaging Sci..

[17]  S. Kondratyev,et al.  A new optimal transport distance on the space of finite Radon measures , 2015, Advances in Differential Equations.

[18]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[19]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[20]  François-Xavier Vialard,et al.  Scaling algorithms for unbalanced optimal transport problems , 2017, Math. Comput..

[21]  Simone Di Marino,et al.  A tumor growth model of Hele-Shaw type as a gradient flow , 2017, ESAIM: Control, Optimisation and Calculus of Variations.

[22]  Matt J. Kusner,et al.  Supervised Word Mover's Distance , 2016, NIPS.

[23]  Adam M. Oberman,et al.  Convergent Finite Difference Solvers for Viscosity Solutions of the Elliptic Monge-Ampère Equation in Dimensions Two and Higher , 2010, SIAM J. Numer. Anal..

[24]  Edouard Grave,et al.  Unsupervised Alignment of Embeddings with Wasserstein Procrustes , 2018, AISTATS.

[25]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[26]  C. Villani,et al.  Ricci curvature for metric-measure spaces via optimal transport , 2004, math/0412127.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.