Metric Properties of Structured Data Visualizations through Generative Probabilistic Modeling

Recently, generative probabilistic modeling principles were extended to visualization of structured data types, such as sequences. The models are formulated as constrained mixtures of sequence models - a generalization of density-based visualization methods previously developed for static data sets. In order to effectively explore visualization plots, one needs to understand local directional magnification factors, i.e. the extend to which small positional changes on visualization plot lead to changes in local noise models explaining the structured data. Magnification factors are useful for highlighting boundaries between data clusters. In this paper we present two techniques for estimating local metric induced on the sequence space by themodel formulation. We first verify our approach in two controlled experiments involving artificially generated sequences. We then illustrate our methodology on sequences representing chorals by J.S. Bach.

[1]  Taylor Francis Online,et al.  Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America. , 1992 .

[2]  Paulo Gonçalves,et al.  Computational methods for hidden Markov tree models-an application to wavelet trees , 2004, IEEE Transactions on Signal Processing.

[3]  Peter Tiño,et al.  A generative probabilistic approach to visualizing sets of symbolic sequences , 2004, KDD '04.

[4]  Marc Strickert,et al.  Neural Gas for Sequences , 2003 .

[5]  Christopher K. I. Williams,et al.  Magnification factors for the GTM algorithm , 1997 .

[6]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[7]  Thomas Voegtlin,et al.  Recursive self-organizing maps , 2002, Neural Networks.

[8]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[9]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[10]  Peter Tiño,et al.  Semisupervised learning of hierarchical latent trait models for data visualization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11]  James P. Hughes,et al.  Exact Computation of the Observed Information Matrix for Hidden Markov Models , 2002 .

[12]  M. Do Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models , 2003, IEEE Signal Processing Letters.

[13]  Jukka Heikkonen,et al.  Recurrent SOM with local linear models in time series prediction , 1998, ESANN.

[14]  KEIICHI HORIO,et al.  Feedback Self-Organizing Map and its Application to Spatio-Temporal Pattern Classification , 2001, Int. J. Comput. Intell. Appl..

[15]  Magniication Factors for the Gtm Algorithm , 2007 .

[16]  Ata Kabán,et al.  A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  S. Kullback,et al.  Information Theory and Statistics , 1959 .