Making nonlinear manifold learning models interpretable: The manifold grand tour

Smooth nonlinear topographic maps of the data distribution to guide a Grand Tour visualisation.Prioritisation of data linear views that are most consistent with data structure in the maps.Useful visualisations that cannot be obtained by other more classical approaches. Dimensionality reduction is required to produce visualisations of high dimensional data. In this framework, one of the most straightforward approaches to visualising high dimensional data is based on reducing complexity and applying linear projections while tumbling the projection axes in a defined sequence which generates a Grand Tour of the data. We propose using smooth nonlinear topographic maps of the data distribution to guide the Grand Tour, increasing the effectiveness of this approach by prioritising the linear views of the data that are most consistent with global data structure in these maps. A further consequence of this approach is to enable direct visualisation of the topographic map onto projective spaces that discern structure in the data. The experimental results on standard databases reported in this paper, using self-organising maps and generative topographic mapping, illustrate the practical value of the proposed approach. The main novelty of our proposal is the definition of a systematic way to guide the search of data views in the grand tour, selecting and prioritizing some of them, based on nonlinear manifold models.

[1]  Christopher M. Bishop,et al.  Learning in Graphical Models , 1999 .

[2]  Heike Hofmann,et al.  Tourr: An R package for exploring multivariate data with projections , 2011 .

[3]  Dianne Cook,et al.  Interactive and Dynamic Graphics for Data Analysis: A Case Study On Quasar Data , 2003 .

[4]  Paulo J. G. Lisboa,et al.  Cluster-based visualisation with scatter matrices , 2008, Pattern Recognit. Lett..

[5]  I. Jolliffe Principal Component Analysis , 2002 .

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[7]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[8]  Edward J. Wegman,et al.  On some mathematics for visualizing high dimensional data , 2002 .

[9]  Alfredo Vellido,et al.  Advances in clustering and visualization of time series using GTM through time , 2008, Neural Networks.

[10]  Alfredo Vellido,et al.  Semi-Supervised Analysis of Human Brain Tumours from Partially Labeled MRS Information, Using Manifold Learning Models , 2011, Int. J. Neural Syst..

[11]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[12]  Andreas Buja,et al.  Computational Methods for High-Dimensional Rotations in Data Visualization , 2005 .

[13]  Calyampudi Radhakrishna Rao,et al.  Handbook of Statistics, Volume 24: Data Mining and Data Visualization (Handbook of Statistics) , 2005 .

[14]  Lluís A. Belanche Muñoz,et al.  Outlier exploration and diagnostic classification of a multi-centre 1H-MRS brain tumour database , 2009, Neurocomputing.

[15]  Daniel Asimov,et al.  The grand tour: a tool for viewing multidimensional data , 1985 .

[16]  Jeffrey Heer,et al.  Narrative Visualization: Telling Stories with Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[17]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[18]  Paulo J. G. Lisboa,et al.  Selective smoothing of the generative topographic mapping , 2003, IEEE Trans. Neural Networks.

[19]  Dianne Cook,et al.  tourrGui: A gWidgets GUI for the tour to explore high-dimensional data using low-dimensional projections , 2012 .

[20]  L B Nash,et al.  Seeing is believing , 2013, BDJ.

[21]  Paulo J. G. Lisboa,et al.  Seeing is believing: The importance of visualization in real-world machine learning applications , 2011, ESANN.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  G. Smith,et al.  Food research and data analysis , 1983 .

[24]  Peter Cheeseman,et al.  Bayesian Methods for Adaptive Models , 2011 .

[25]  Andreas Buja,et al.  Grand tour methods: an outline , 1986 .

[26]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[27]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[28]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[29]  Mihael Ankerst,et al.  Visual Data Mining , 2001, Encyclopedia of GIS.

[30]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[31]  Paulo J. G. Lisboa,et al.  Learning what is important: feature selection and rule extraction in a virtual course , 2006, ESANN.