论文信息 - Iterative Neural Autoregressive Distribution Estimator NADE-k - 字舞流文

Iterative Neural Autoregressive Distribution Estimator NADE-k

Training of the neural autoregressive density estimator (NADE) can be viewed as doing one step of probabilistic inference on missing values in data. We propose a new model that extends this inference scheme to multiple steps, arguing that it is easier to learn to improve a reconstruction in k steps rather than to learn to reconstruct in a single inference step. The proposed model is an unsupervised building block for deep learning that combines the desirable properties of NADE and multi-prediction training: (1) Its test likelihood can be computed analytically, (2) it is easy to generate independent samples from it, and (3) it uses an inference engine that is a superset of variational inference for Boltzmann machines. The proposed NADE-k is competitive with the state-of-the-art in density estimation on the two datasets tested.

Tapani Raiko | Li Yao | Yoshua Bengio | Kyunghyun Cho | Yoshua Bengio | Kyunghyun Cho | T. Raiko | L. Yao

[1] Carsten Peterson,et al. A Mean Field Theory Learning Algorithm for Neural Networks , 1987, Complex Syst..

[2] Samy Bengio,et al. Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[3] David Maxwell Chickering,et al. Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[4] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5] F. Huang,et al. Generalized Pseudo-Likelihood Estimates for Markov Random Fields on Lattice , 2002 .

[6] Nando de Freitas,et al. Inductive Principles for Restricted Boltzmann Machine Learning , 2010, AISTATS.

[7] Pascal Vincent,et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[8] Veselin Stoyanov,et al. Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[9] Hugo Larochelle,et al. The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[10] Justin Domke,et al. Parameter learning with truncated message-passing , 2011, CVPR 2011.

[11] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.

[12] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[13] Tapani Raiko,et al. Enhanced Gradient for Training Restricted Boltzmann Machines , 2013, Neural Computation.

[14] Yoshua Bengio,et al. Multi-Prediction Deep Boltzmann Machines , 2013, NIPS.

[15] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.

[16] Benjamin Schrauwen,et al. Training energy-based models for time-series imputation , 2013, J. Mach. Learn. Res..

[17] Daan Wierstra,et al. Deep AutoRegressive Networks , 2013, ICML.

[18] Hugo Larochelle,et al. A Deep and Tractable Density Estimator , 2013, ICML.

[19] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.

[20] Li Yao,et al. On the Equivalence between Deep NADE and Generative Stochastic Networks , 2014, ECML/PKDD.