Clustering hidden Markov models with variational HEM

The hidden Markov model (HMM) is a widely-used generative model that copes with sequential data, assuming that each observation is conditioned on the state of a hidden Markov chain. In this paper, we derive a novel algorithm to cluster HMMs based on the hierarchical EM (HEM) algorithm. The proposed algorithm i) clusters a given collection of HMMs into groups of HMMs that are similar, in terms of the distributions they represent, and ii) characterizes each group by a "cluster center", i.e., a novel HMM that is representative for the group, in a manner that is consistent with the underlying generative model of the HMM. To cope with intractable inference in the E-step, the HEM algorithm is formulated as a variational optimization problem, and efficiently solved for the HMM case by leveraging an appropriate variational approximation. The benefits of the proposed algorithm, which we call variational HEM (VHEM), are demonstrated on several tasks involving time-series data, such as hierarchical clustering of motion capture sequences, and automatic annotation and retrieval of music and of online hand-writing data, showing improvements over current methods. In particular, our variational HEM algorithm effectively leverages large amounts of data when learning annotation models by using an efficient hierarchical estimation procedure, which reduces learning times and memory requirements, while improving model robustness through better regularization.

[1]  E. Nyström Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[2]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[5]  L. Hubert,et al.  Comparing partitions , 1985 .

[6]  Kin Hong Wong,et al.  Script recognition using hidden Markov models , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[8]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[9]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[10]  Xuedong Huang,et al.  Semi-continuous hidden Markov models for speech signals , 1990 .

[11]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[12]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[13]  P. Hall,et al.  On blocking rules for the bootstrap with dependent data , 1995 .

[14]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[15]  Nuno Vasconcelos,et al.  Learning Mixture Hierarchies , 1998, NIPS.

[16]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[17]  Laura Firoiu,et al.  Clustering Time Series with Hidden Markov Models and Dynamic Time Warping , 1999 .

[18]  Christian N. S. Pedersen,et al.  Metrics and Similarity Measures for Hidden Markov Models , 1999, ISMB.

[19]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[20]  Tommi S. Jaakkola,et al.  Tutorial on variational approximation methods , 2000 .

[21]  Claus Bahlmann,et al.  Measuring HMM similarity with the Bayes probability of error and its application to online handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[22]  Bernhard Schölkopf,et al.  Sampling Techniques for Kernel Methods , 2001, NIPS.

[23]  Nuno Vasconcelos,et al.  Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[24]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[25]  Manuele Bicego,et al.  A Hidden Markov Model-Based Approach to Sequential Data Clustering , 2002, SSPR/SPR.

[26]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[27]  E. Batlle,et al.  Automatic Song Identification in Noisy Broadcast Audio , 2002 .

[28]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[29]  Joydeep Ghosh,et al.  A Unified Framework for Model-based Clustering , 2003, J. Mach. Learn. Res..

[30]  William M. Campbell,et al.  A SVM/HMM system for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[31]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[34]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[35]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[36]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[37]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[38]  Giorgio Zoia,et al.  On the Modeling of Time Information for Automatic Genre Recognition Systems in Audio Signals , 2005, ISMIR.

[39]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[40]  Qiang Yang,et al.  Integrating hidden Markov models and spectral analysis for sensory time series clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[41]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[42]  William J. Byrne,et al.  Convergence Theorems for Generalized Alternating Minimization Procedures , 2005, J. Mach. Learn. Res..

[43]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[44]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[45]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[46]  Chin-Hui Lee,et al.  A Study on Music Genre Classification Based on Universal Acoustic Models , 2006, ISMIR.

[47]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[48]  Christian Hennig,et al.  Cluster-wise assessment of cluster stability , 2007, Comput. Stat. Data Anal..

[49]  Thierry Bertin-Mahieux,et al.  Automatic Generation of Social Tags for Music Recommendation , 2007, NIPS.

[50]  Tony Jebara,et al.  Spectral Clustering and Embedding with Hidden Markov Models , 2007, ECML.

[51]  Lawrence Carin,et al.  Music Analysis Using Hidden Markov Mixture Models , 2007, IEEE Transactions on Signal Processing.

[52]  Huosheng Hu,et al.  Action classification of 3D human models using dynamic ANNs for mobile robot surveillance , 2007, 2007 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[53]  John R. Hershey,et al.  Variational Kullback-Leibler divergence for Hidden Markov models , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[54]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[56]  Mehryar Mohri,et al.  Learning sequence kernels , 2008 .

[57]  Daniel P. W. Ellis,et al.  Multiple-Instance Learning for Music Information Retrieval , 2008, ISMIR.

[58]  John R. Hershey,et al.  Variational Bhattacharyya divergence for hidden Markov models , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[59]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[60]  Vladimir Pavlovic,et al.  Scalable Algorithms for String Kernels with Inexact Matching , 2008, NIPS.

[61]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[62]  Perry R. Cook,et al.  Easy As CBA: A Simple Probabilistic Model for Tagging Music , 2009, ISMIR.

[63]  Antoni B. Chan Derivation of the Hierarchical EM algorithm for Dynamic Textures , 2010 .

[64]  Antoni B. Chan,et al.  Clustering dynamic textures with the hierarchical EM algorithm , 2013, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[65]  Purnamrita Sarkar,et al.  Bootstrapping Big Data , 2011 .

[66]  Antoni B. Chan,et al.  Time Series Models for Semantic Music Annotation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[67]  Antoni B. Chan,et al.  Growing a bag of systems tree for fast and accurate classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  José A. Rodríguez-Serrano,et al.  A Model-Based Sequence Similarity with Application to Handwritten Word Spotting , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  Antoni B. Chan,et al.  The variational hierarchical EM algorithm for clustering hidden Markov models , 2012, NIPS.

[70]  Antoni B. Chan,et al.  Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.