论文信息 - The Gaussian Transform

The Gaussian Transform

We introduce the Gaussian transform (GT), an optimal transport inspired iterative method for denoising and enhancing latent structures in datasets. Under the hood, GT generates a new distance function (GT distance) on a given dataset by computing the $\ell^2$-Wasserstein distance between certain Gaussian density estimates obtained by localizing the dataset to individual points. Our contribution is twofold: (1) theoretically, we establish firstly that GT is stable under perturbations and secondly that in the continuous case, each point possesses an asymptotically ellipsoidal neighborhood with respect to the GT distance; (2) computationally, we accelerate GT both by identifying a strategy for reducing the number of matrix square root computations inherent to the $\ell^2$-Wasserstein distance between Gaussian measures, and by avoiding redundant computations of GT distances between points via enhanced neighborhood mechanisms. We also observe that GT is both a generalization and a strengthening of the mean shift (MS) method, and it is also a computationally efficient specialization of the recently proposed Wasserstein Transform (WT) method. We perform extensive experimentation comparing their performance in different scenarios.

[1] John B. Goodenough,et al. Contextual correlates of synonymy , 1965, CACM.

[2] Christopher D. Manning,et al. Bilingual Word Embeddings for Phrase-Based Machine Translation , 2013, EMNLP.

[3] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.

[4] G. Miller,et al. Contextual correlates of semantic similarity , 1991 .

[5] Felix Hill,et al. SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[6] Yizong Cheng,et al. Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7] C. Givens,et al. A class of Wasserstein metrics for probability distributions. , 1984 .

[8] Tommi S. Jaakkola,et al. Gromov-Wasserstein Alignment of Word Embedding Spaces , 2018, EMNLP.

[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[10] Leonidas J. Guibas,et al. A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[12] Anna Korhonen,et al. An Unsupervised Model for Instance Level Subcategorization Acquisition , 2014, EMNLP.

[13] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[14] Paul W. Fieguth,et al. A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure , 2015, Adv. Eng. Informatics.

[15] S. Gaubert,et al. Matrix versions of the Hellinger distance , 2019, Letters in Mathematical Physics.

[16] Evgeniy Gabrilovich,et al. Large-scale learning of word relatedness with constraints , 2012, KDD.

[17] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .

[18] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19] Dorin Comaniciu,et al. Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20] C. Spearman. The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[21] M. Gelbrich. On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces , 1990 .

[22] Marco Cuturi,et al. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[23] Gabriel Peyré,et al. Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[24] Larry D. Hostetler,et al. The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[25] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.

[26] Alison L Gibbs,et al. On Choosing and Bounding Probability Metrics , 2002, math/0209021.

[27] Nicolas Courty,et al. Optimal Transport for Domain Adaptation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Facundo Mémoli,et al. The Wasserstein transform , 2019, ICML.

[29] Timothy M. Hospedales,et al. Analogies Explained: Towards Understanding Word Embeddings , 2019, ICML.

[30] Evgeniy Gabrilovich,et al. A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[31] Facundo Mémoli,et al. The shape of data and probability measures , 2015, Applied and Computational Harmonic Analysis.

[32] Richard Szeliski,et al. Computer Vision - Algorithms and Applications , 2011, Texts in Computer Science.

[33] Facundo Mémoli,et al. Multiscale Covariance Fields, Local Scales, and Shape Transforms , 2013, GSI.

[34] Utpal Garain,et al. Named Entity Recognition with Word Embeddings and Wikipedia Categories for a Low-Resource Language , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[35] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[36] Christopher D. Manning,et al. Better Word Representations with Recursive Neural Networks for Morphology , 2013, CoNLL.

[37] Shuji Hashimoto,et al. Fast crack detection method for large-size concrete surface images using percolation-based image processing , 2010, Machine Vision and Applications.

[38] Marco Cuturi,et al. Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions , 2018, NeurIPS.

[39] Bo Thiesson,et al. Image and Video Segmentation by Anisotropic Kernel Mean Shift , 2004, ECCV.

[40] Andrew McCallum,et al. Word Representations via Gaussian Embedding , 2014, ICLR.

[41] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[42] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[43] Zijian Zhang,et al. Asynchronous Training of Word Embeddings for Large Text Corpora , 2018, WSDM.

[44] D. Bures. An extension of Kakutani’s theorem on infinite product measures to the tensor product of semifinite *-algebras , 1969 .

[45] Jitendra Malik,et al. Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[46] Ehud Rivlin,et al. Placing search in context: the concept revisited , 2002, TOIS.

[47] David M. W. Powers,et al. Verb similarity on the taxonomy of WordNet , 2006 .

[48] Felix Hill,et al. SimVerb-3500: A Large-Scale Evaluation Set of Verb Similarity , 2016, EMNLP.

[49] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..