Bayesian mixtures of Hidden Tree Markov Models for structured data clustering

Abstract The paper deals with the problem of unsupervised learning with structured data, proposing a mixture model approach to cluster tree samples. First, we discuss how to use the Switching-Parent Hidden Tree Markov Model, a compositional model for learning tree distributions, to define a finite mixture model where the number of components is fixed by a hyperparameter. Then, we show how to relax such an assumption by introducing a Bayesian non-parametric mixture model where the number of necessary hidden tree components is learned from data. Experimental validation on synthetic and real datasets show the benefit of mixture models over simple hidden tree models in clustering applications. Further, we provide a characterization of the behaviour of the two mixture models for different choices of their hyperparameters.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  Peter Tiño,et al.  Visualization of Tree-Structured Data Through Generative Topographic Mapping , 2008, IEEE Transactions on Neural Networks.

[3]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[4]  Davide Bacciu,et al.  Generative Kernels for Tree-Structured Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Ludovic Denoyer,et al.  Report on the XML mining track at INEX 2005 and INEX 2006: categorization and clustering of XML documents , 2007, SIGF.

[6]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[7]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[8]  Paolo Frasconi,et al.  Hidden Tree Markov Models for Document Image Classification , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ah Chung Tsoi,et al.  Clustering XML Documents Using Self-organizing Maps for Structures , 2005, INEX.

[10]  Davide Bacciu,et al.  Text Summarization as Tree Transduction by Top-Down TreeLSTM , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[11]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[12]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[13]  Elena Deza,et al.  Encyclopedia of Distances , 2014 .

[14]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[15]  Alessandro Sperduti,et al.  A general framework for adaptive processing of data structures , 1998, IEEE Trans. Neural Networks.

[16]  Davide Bacciu Hidden tree Markov networks: Deep and wide learning for structured data , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[17]  Davide Bacciu,et al.  An input-output hidden Markov model for tree transductions , 2013, Neurocomputing.

[18]  Davide Bacciu,et al.  Compositional Generative Mapping for Tree-Structured Data—Part II: Topographic Projection Model , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Claudio Gallicchio,et al.  Tree Echo State Networks , 2013, Neurocomputing.

[20]  Markus Hagenbuchner,et al.  Learning Nonsparse Kernels by Self-Organizing Maps for Structured Data , 2009, IEEE Transactions on Neural Networks.

[21]  Robert D. Nowak,et al.  Wavelet-based statistical signal processing using hidden Markov models , 1998, IEEE Trans. Signal Process..

[22]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .