The Gaussian Transform

We introduce the Gaussian transform (GT), an optimal transport inspired iterative method for denoising and enhancing latent structures in datasets. Under the hood, GT generates a new distance function (GT distance) on a given dataset by computing the $\ell^2$-Wasserstein distance between certain Gaussian density estimates obtained by localizing the dataset to individual points. Our contribution is twofold: (1) theoretically, we establish firstly that GT is stable under perturbations and secondly that in the continuous case, each point possesses an asymptotically ellipsoidal neighborhood with respect to the GT distance; (2) computationally, we accelerate GT both by identifying a strategy for reducing the number of matrix square root computations inherent to the $\ell^2$-Wasserstein distance between Gaussian measures, and by avoiding redundant computations of GT distances between points via enhanced neighborhood mechanisms. We also observe that GT is both a generalization and a strengthening of the mean shift (MS) method, and it is also a computationally efficient specialization of the recently proposed Wasserstein Transform (WT) method. We perform extensive experimentation comparing their performance in different scenarios.

[1]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[2]  Christopher D. Manning,et al.  Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[3]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[4]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[5]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[6]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[8]  Tommi S. Jaakkola,et al.  Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[9]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[12]  Anna Korhonen,et al.  An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[13]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14]  Paul W. Fieguth,et al.  A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure , 2015, Adv. Eng. Informatics.

[15]  S. Gaubert,et al.  Matrix versions of the Hellinger distance , 2019, Letters in Mathematical Physics.

[16]  Evgeniy Gabrilovich,et al.  Large-scale learning of word relatedness with constraints , 2012, KDD.

[17]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[21]  M. Gelbrich On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces , 1990 .

[22]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[23]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[24]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[25]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[26]  Alison L Gibbs,et al.  On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[27]  Nicolas Courty,et al.  Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Facundo Mémoli,et al.  The Wasserstein transform , 2019, ICML.

[29]  Timothy M. Hospedales,et al.  Analogies Explained: Towards Understanding Word Embeddings , 2019, ICML.

[30]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[31]  Facundo Mémoli,et al.  The shape of data and probability measures , 2015, Applied and Computational Harmonic Analysis.

[32]  Richard Szeliski,et al.  Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[33]  Facundo Mémoli,et al.  Multiscale Covariance Fields, Local Scales, and Shape Transforms , 2013, GSI.

[34]  Utpal Garain,et al.  Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[35]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36]  Christopher D. Manning,et al.  Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[37]  Shuji Hashimoto,et al.  Fast crack detection method for large-size concrete surface images using percolation-based image processing , 2010, Machine Vision and Applications.

[38]  Marco Cuturi,et al.  Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions , 2018, NeurIPS.

[39]  Bo Thiesson,et al.  Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[40]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[41]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[42]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[43]  Zijian Zhang,et al.  Asynchronous Training of Word Embeddings for Large Text Corpora , 2018, WSDM.

[44]  D. Bures An extension of Kakutani’s theorem on infinite product measures to the tensor product of semifinite *-algebras , 1969 .

[45]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[47]  David M. W. Powers,et al.  Verb similarity on the taxonomy of WordNet , 2006 .

[48]  Felix Hill,et al.  SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[49]  Elia Bruni,et al.  Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..