Predictive Modelling of Heterogeneous Sequence Collections by Topographic Ordering of Histories

Abstract We propose a model-based approach to the twofold problem of prediction and exploratory analysis of heterogeneous symbolic sequence collections. Our model is based on seeking low entropy local representations joined together with a smooth nonlinear mixing process. Low entropy components are desirable, as they tend to be both more interpretable and more predictable. The nonlinear mixing in turn acts as a regulariser, and in addition, it creates a topographic ordering of the sequence histories, which is useful for exploratory purposes. The combination of these two modelling elements is performed through the generative probabilistic formalism, which ensures a flexible and technically sound predictive modelling framework. Unlike previous generative topographic modelling approaches for discrete data, the estimation algorithm associated with our model is designed to scale to large data sets by exploiting data sparseness. In addition, local convergence is guaranteed without the need for tuning optimisation parameters or making approximations to the non-Gaussian likelihood. These characteristics make it the first generative topographic model for discrete symbolic data with large scale real-world applicability. We analyse and discuss the relationship of our approach with a number of models and methods. We empirically demonstrate robustness against varying sample sizes, leading to significant improvements in terms of predictive performance over the state of the art. Finally we detail an application to the prediction and exploratory analysis of a large real-world web navigation sequence collection.

[1]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[2]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[3]  Naren Ramakrishnan,et al.  Mining scientific data , 2001, Adv. Comput..

[4]  Olli Simula,et al.  A Self-Organizing Map for Clustering Probabilistic Models , 1999 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[7]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[8]  S. Renals,et al.  Experimental evaluation of latent variable models for dimensionality reduction , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[9]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[10]  Pragya Agarwal,et al.  Self-Organising Maps , 2008 .

[11]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[12]  Thomas L. Griffiths,et al.  Parametric Embedding for Class Visualization , 2004, Neural Computation.

[13]  Peter Tiño,et al.  A generative probabilistic approach to visualizing sets of symbolic sequences , 2004, KDD '04.

[14]  Carsten Peterson,et al.  A New Method for Mapping Optimization Problems Onto Neural Networks , 1989, Int. J. Neural Syst..

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Carl Tim Kelley,et al.  Iterative methods for optimization , 1999, Frontiers in applied mathematics.

[17]  Ata Kabán,et al.  A Combined Latent Class and Trait Model for the Analysis and Visualization of Discrete Data , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[19]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[20]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[21]  Joachim M. Buhmann,et al.  Competitive learning algorithms for robust vector quantization , 1998, IEEE Trans. Signal Process..

[22]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[23]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[24]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[25]  Samy Bengio,et al.  Theme Topic Mixture Model: A Graphical Model for Document Representation , 2004 .

[26]  Thomas Hofmann,et al.  ProbMap - A probabilistic approach for mapping large document collections , 2000, Intell. Data Anal..

[27]  Padhraic Smyth,et al.  Model-Based Clustering and Visualization of Navigation Patterns on a Web Site , 2003, Data Mining and Knowledge Discovery.

[28]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[29]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[30]  Jiann-Ming Wu,et al.  Independent component analysis using Potts models , 2001, IEEE Trans. Neural Networks.

[31]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[32]  Ata Kabán,et al.  Sequential Activity Profiling: Latent Dirichlet Allocation of Markov Chains , 2005, Data Mining and Knowledge Discovery.

[33]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[34]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[35]  Ata Kabán A scalable generative topographic mapping for sparse data sequences , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[36]  Hagai Attias Learning in high dimensions: modular mixture models , 2001, AISTATS.