论文信息 - Chapter 1 Two problems with variational expectation maximisation for time-series models

Chapter 1 Two problems with variational expectation maximisation for time-series models

Variational methods are a key component of the approximate inference and learning toolbox. These methods fill an important middle ground, retaining distributional information about uncertainty in latent variables, unlike maximum a posteriori methods (MAP), and yet generally requiring less computational time than Monte Carlo Markov Chain methods. In particular the variational Expectation Maximisation (vEM) and variational Bayes algorithms, both involving variational optimisation of a free-energy, are widely used in time-series modelling. Here, we investigate the success of vEM in simple probabilistic time-series models. First we consider the inference step of vEM, and show that a consequence of the well-known compactness property of variational inference is a failure to propagate uncertainty in time, thus limiting the usefulness of the retained distributional information. In particular, the uncertainty may appear to be smallest precisely when the approximation is poorest. Second, we consider parameter learning and analytically reveal systematic biases in the parameters found by vEM. Surprisingly, simpler variational approximations (such a mean-field) can lead to less bias than more complicated structured approximations.

Richard E. Turner | M. Sahani

[1] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[2] R. Hathaway. Another interpretation of the EM algorithm for mixture distributions , 1986 .

[3] Michael I. Jordan,et al. Improving the Mean Field Approximation Via the Use of Mixture Distributions , 1999, Learning in Graphical Models.

[4] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5] Michael I. Jordan,et al. An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[6] Michael I. Jordan,et al. Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[7] Matthew J. Beal. Variational algorithms for approximate Bayesian inference , 2003 .

[8] Bo Wang,et al. Lack of Consistency of Mean Field and Variational Bayes Approximations for State Space Models , 2004, Neural Processing Letters.

[9] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[10] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[12] M. Sahani,et al. Counterexamples to variational free energy compactness folk theorems , 2008 .