Multi-level visualisation using Gaussian process latent variable models

Projection of a high-dimensional dataset onto a two-dimensional space is a useful tool to visualise structures and relationships in the dataset. However, a single two-dimensional visualisation may not display all the intrinsic structure. Therefore, hierarchical/multi-level visualisation methods have been used to extract more detailed understanding of the data. Here we propose a multi-level Gaussian process latent variable model (MLGPLVM). MLGPLVM works by segmenting data (with e.g. K-means, Gaussian mixture model or interactive clustering) in the visualisation space and then fitting a visualisation model to each subset. To measure the quality of multi-level visualisation (with respect to parent and child models), metrics such as trustworthiness, continuity, mean relative rank errors, visualisation distance distortion and the negative log-likelihood per point are used. We evaluate the MLGPLVM approach on the ‘Oil Flow’ dataset and a dataset of protein electrostatic potentials for the ‘Major Histocompatibility Complex (MHC) class I’ of humans. In both cases, visual observation and the quantitative quality measures have shown better visualisation at lower levels.

[1]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[2]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[3]  Peter Tiño,et al.  Multiple Manifolds Learning Framework Based on Hierarchical Mixture Density Model , 2008, ECML/PKDD.

[4]  Luca Maria Gambardella,et al.  Learing Fine Motion by Using the Hierarchical Extended Kohonen Map , 1996, ICANN.

[5]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[6]  Ben Shneiderman,et al.  Inventing Discovery Tools: Combining Information Visualization with Data Mining1 , 2001, Inf. Vis..

[7]  A. Vellido,et al.  Review of Hierarchical Models for Data Clustering and Visualization , 2004 .

[8]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[9]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[10]  C. Bishop,et al.  Analysis of multiphase flows using dual-energy gamma densitometry and neural networks , 1993 .

[11]  Tomoharu Iwata,et al.  Warped Mixtures for Nonparametric Cluster Shapes , 2012, UAI.

[12]  Risto Miikkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1992 .

[13]  Carl E. Rasmussen,et al.  A Unifying View of Sparse Approximate Gaussian Process Regression , 2005, J. Mach. Learn. Res..

[14]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[15]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[16]  Jarkko Venna,et al.  Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study , 2001, ICANN.

[17]  Kai Hormann,et al.  The point in polygon problem for arbitrary polygons , 2001, Comput. Geom..

[18]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[19]  Michael E. Tipping,et al.  NeuroScale: Novel Topographic Feature Extraction using RBF Networks , 1996, NIPS.

[20]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Jouko Lampinen,et al.  Clustering properties of hierarchical self-organizing maps , 1992, Journal of Mathematical Imaging and Vision.

[23]  Darren R. Flower,et al.  Novel visualization methods for protein data , 2012, 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[24]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[25]  Herbert A. Simon,et al.  Why a Diagram is (Sometimes) Worth Ten Thousand Words , 1987, Cogn. Sci..

[26]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Tobias Schreck,et al.  Techniques for Precision-Based Visual Analysis of Projected Data , 2010, Inf. Vis..

[28]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[29]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..