Distance-Preserving Probabilistic Embeddings with Side Information: Variational Bayesian Multidimensional Scaling Gaussian Process

Embeddings or vector representations of objects have been used with remarkable success in various machine learning and AI tasks--from dimensionality reduction and data visualization, to vision and natural language processing. In this work, we seek probabilistic embeddings that faithfully represent observed relationships between objects (e.g., physical distances, preferences). We derive a novel variational Bayesian variant of multidimensional scaling that (i) provides a posterior distribution over latent points without computationally-heavy Markov chain Monte Carlo (MCMC) sampling, and (ii) can leverage existing side information using sparse Gaussian processes (GPs) to learn a nonlinear mapping to the embedding. By partitioning entities, our method naturally handles incomplete side information from multiple domains, e.g., in product recommendation where ratings are available, but not all users and items have associated profiles. Furthermore, the derived approximate bounds can be used to discover the intrinsic dimensionality of the data and limit embedding complexity. We demonstrate the effectiveness of our methods empirically on three synthetic problems and on the real-world tasks of political unfolding analysis and multi-sensor localization.

[1]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[2]  K. T. Poole,et al.  Bayesian Metric Multidimensional Scaling , 2012, Political Analysis.

[3]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[4]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[5]  K. Whitehouse,et al.  A robustness analysis of multi-hop ranging-based localization approximations , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[6]  David C. Moore,et al.  Robust distributed network localization with noisy range measurements , 2004, SenSys '04.

[7]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[8]  Michael E. Tipping,et al.  NeuroScale: Novel Topographic Feature Extraction using RBF Networks , 1996, NIPS.

[9]  Jarkko Venna,et al.  Local multidimensional scaling , 2006, Neural Networks.

[10]  Kj Love,et al.  A political analysis , 2003 .

[11]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[12]  Neil D. Lawrence,et al.  Gaussian Processes for Big Data , 2013, UAI.

[13]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[14]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[15]  Ying Zhang,et al.  Localization from mere connectivity , 2003, MobiHoc '03.

[16]  Yee Whye Teh,et al.  Variational Bayesian Approach to Movie Rating Prediction , 2007, KDD 2007.

[17]  A. M. Mathai,et al.  Quadratic forms in random variables : theory and applications , 1992 .

[18]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[19]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[20]  Michalis K. Titsias,et al.  Variational Learning of Inducing Variables in Sparse Gaussian Processes , 2009, AISTATS.

[21]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[24]  Guillermo Sapiro,et al.  Kernelized Probabilistic Matrix Factorization: Exploiting Graphs and Side Information , 2012, SDM.

[25]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[26]  R N Shepard,et al.  Multidimensional Scaling, Tree-Fitting, and Clustering , 1980, Science.

[27]  Tommi S. Jaakkola,et al.  Word, graph and manifold embedding from Markov processes , 2015, ArXiv.

[28]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[29]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[30]  Neil D. Lawrence,et al.  Bayesian Gaussian Process Latent Variable Model , 2010, AISTATS.

[31]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.