PRIMAL-GMM: PaRametrIc MAnifold Learning of Gaussian Mixture Models

We propose a ParametRIc MAnifold Learning (PRIMAL) algorithm for Gaussian mixtures models (GMM), assuming that GMMs lie on or near to a manifold of probability distributions that is generated from a low-dimensional hierarchical latent space through parametric mappings. Inspired by principal component analysis (PCA), the generative processes for priors, means and covariance matrices are modeled by their respective latent space and parametric mapping. Then, the dependencies between latent spaces are captured by a hierarchical latent space by a linear or kernelized mapping. The function parameters and hierarchical latent space are learned by minimizing the reconstruction error between ground-truth GMMs and manifold-generated GMMs, measured by Kullback-Leibler Divergence (KLD). Variational approximation is employed to handle the intractable KLD between GMMs and a variational EM algorithm is derived to optimize the objective function. Experiments on synthetic data, flow cytometry analysis, eye-fixation analysis and topic models show that PRIMAL learns a continuous and interpretable manifold of GMM distributions and achieves a minimum reconstruction error.

[1]  P. Deb Finite Mixture Models , 2008 .

[2]  Antoni B. Chan,et al.  Parametric Manifold Learning of Gaussian Mixture Models , 2019, IJCAI.

[3]  Lei Yu,et al.  Density-Preserving Hierarchical EM Algorithm: Simplifying Gaussian Mixture Models for Approximate Inference , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yair Weiss,et al.  On GANs and GMMs , 2018, NeurIPS.

[5]  Kewei Tu,et al.  Gaussian Mixture Latent Vector Grammars , 2018, ACL.

[6]  Huanhuan Chen,et al.  Latent Topic Text Representation Learning on Statistical Manifolds , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[7]  XuanLong Nguyen,et al.  Geometric Dirichlet Means Algorithm for topic inference , 2016, NIPS.

[8]  Parag Singla,et al.  Entity-balanced Gaussian pLSA for Automated Comparison , 2016, NAACL.

[9]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[10]  Rajarshi Das,et al.  Gaussian LDA for Topic Models with Word Embeddings , 2015, ACL.

[11]  Tim Chuk,et al.  Understanding eye movements in face recognition using hidden Markov models. , 2014, Journal of vision.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Antoni B. Chan,et al.  Clustering hidden Markov models with variational HEM , 2012, J. Mach. Learn. Res..

[14]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[15]  Greg Finak,et al.  Critical assessment of automated flow cytometry data analysis techniques , 2013, Nature Methods.

[16]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[17]  Antoni B. Chan,et al.  Clustering Dynamic Textures with the Hierarchical EM Algorithm for Modeling Video , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Antoni B. Chan,et al.  The variational hierarchical EM algorithm for clustering hidden Markov models , 2012, NIPS.

[19]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[21]  Antoni B. Chan,et al.  Clustering dynamic textures with the hierarchical EM algorithm , 2013, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Zhihua Zhang,et al.  Bayesian Generalized Kernel Models , 2010, AISTATS.

[23]  David Newman,et al.  External evaluation of topic models , 2009 .

[24]  Alfred O. Hero,et al.  FINE: Fisher Information Nonparametric Embedding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  Inderjit S. Dhillon,et al.  Differential Entropic Clustering of Multivariate Gaussians , 2006, NIPS.

[27]  Helen Arnold,et al.  Hitchhiker's guide to the galaxy , 2006, SIGGRAPH '06.

[28]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[29]  Derek Greene,et al.  Practical solutions to the problem of diagonal dominance in kernel document clustering , 2006, ICML.

[30]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[31]  Zoubin Ghahramani,et al.  Sparse Gaussian Processes using Pseudo-inputs , 2005, NIPS.

[32]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[33]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[34]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[35]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[36]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[37]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[38]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[39]  Y. Wu,et al.  Dynamic Textures , 2003, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[40]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[41]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[42]  Nuno Vasconcelos,et al.  Learning Mixture Hierarchies , 1998, NIPS.

[43]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[44]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[45]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[46]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[47]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[48]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[49]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[50]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[51]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[52]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.