论文信息 - Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize

Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize

Abstract Scalable algorithms of posterior approximation allow Bayesian nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. Recent algorithms, notably the stochastic variational inference performs local learning from minibatch. The main problem with stochastic variational inference is that it relies on closed form solution. Stochastic gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using stochastic gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning. In order to achieve both speed and performance, we turn our focus to stepsize optimization in stochastic gradient ascent. As as intermediate approach, we first optimize stepsize using the momentum method. Finally, we introduce Fisher information to allow adaptive stepsize in our posterior approximation. In the experiments, we justify that our approach using stochastic gradient ascent do not sacrifice performance for speed when compared to closed form coordinate ascent learning on these datasets. Lastly, our approach is also compatible with deep ConvNet features as well as scalable to large class datasets such as Caltech256 and SUN397.

Xudong Jiang | Kart-Leong Lim

[1] Chong Wang,et al. Stochastic variational inference , 2012, J. Mach. Learn. Res..

[2] Ronald M. Summers,et al. Unsupervised Joint Mining of Deep Features and Image Labels for Large-Scale Radiology Image Categorization and Scene Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[3] Michael I. Jordan,et al. Variational inference for Dirichlet process mixtures , 2006 .

[4] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[5] Diederik P. Kingma,et al. Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[6] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[7] Max Welling,et al. Bayesian k-Means as a Maximization-Expectation Algorithm , 2009, Neural Computation.

[8] Nizar Bouguila,et al. Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection , 2013, Pattern Recognit..

[9] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10] H. Robbins. A Stochastic Approximation Method , 1951 .

[11] Trung Le,et al. Discriminative Bayesian Nonparametric Clustering , 2017, IJCAI.