MAP approximation to the variational Bayes Gaussian mixture model and application

The learning of variational inference can be widely seen as first estimating the class assignment variable and then using it to estimate parameters of the mixture model. The estimate is mainly performed by computing the expectations of the prior models. However, learning is not exclusive to expectation. Several authors report other possible configurations that use different combinations of maximization or expectation for the estimation. For instance, variational inference is generalized under the expectation–expectation (EE) algorithm. Inspired by this, another variant known as the maximization–maximization (MM) algorithm has been recently exploited on various models such as Gaussian mixture, Field-of-Gaussians mixture, and sparse-coding-based Fisher vector. Despite the recent success, MM is not without issue. Firstly, it is very rare to find any theoretical study comparing MM to EE. Secondly, the computational efficiency and accuracy of MM is seldom compared to EE. Hence, it is difficult to convince the use of MM over a mainstream learner such as EE or even Gibbs sampling. In this work, we revisit the learning of EE and MM on a simple Bayesian GMM case. We also made theoretical comparison of MM with EE and found that they in fact obtain near identical solutions. In the experiments, we performed unsupervised classification, comparing the computational efficiency and accuracy of MM and EE on two datasets. We also performed unsupervised feature learning, comparing Bayesian approach such as MM with other maximum likelihood approaches on two datasets.

[1]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Nizar Bouguila,et al.  Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering , 2015, Applied Intelligence.

[3]  Arne Leijon,et al.  Bayesian Estimation of Beta Mixture Models with Variational Inference , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Han Wang,et al.  Sparse Coding Based Fisher Vector Using a Bayesian Approach , 2017, IEEE Signal Processing Letters.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Marc Sebban,et al.  Supervised learning of Gaussian mixture models for visual vocabulary generation , 2012, Pattern Recognit..

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[9]  Changhu Wang,et al.  Probabilistic models for supervised dictionary learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Max Welling,et al.  Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[12]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[13]  Han Wang,et al.  Learning a field of Gaussian mixture model for image classification , 2016, 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[14]  Yee Whye Teh,et al.  Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes , 2004, NIPS.

[15]  Han Wang,et al.  Learning Gaussian mixture model with a maximization-maximization algorithm for image classification , 2016, 2016 12th IEEE International Conference on Control and Automation (ICCA).

[16]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[17]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[19]  Cordelia Schmid,et al.  Approximate Fisher Kernels of Non-iid Image Models for Image Categorization , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Lei Wang,et al.  Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors , 2014, NIPS.

[21]  Nizar Bouguila,et al.  Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection , 2013, Pattern Recognit..