A Reweighted Scheme to Improve the Representation of the Neural Autoregressive Distribution Estimator

The neural autoregressive distribution estimator(NADE) is a competitive model for the task of density estimation in the field of machine learning. While NADE mainly focuses on the problem of estimating density, the ability for dealing with other tasks remains to be improved. In this paper, we introduce a simple and efficient reweighted scheme to modify the parameters of the learned NADE. We make use of the structure of NADE, and the weights are derived from the activations in the corresponding hidden layers. The experiments show that the features from unsupervised learning with our reweighted scheme would be more meaningful, and the performance of the initialization for neural networks has a significant improvement as well.

[1]  Hugo Larochelle,et al.  Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[3]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[4]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[5]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[6]  Yoshua Bengio,et al.  Reweighted Wake-Sleep , 2014, ICLR.

[7]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[8]  Hugo Larochelle,et al.  A Neural Autoregressive Approach to Attention-based Recognition , 2015, International Journal of Computer Vision.

[9]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[10]  Hugo Larochelle,et al.  RNADE: The real-valued neural autoregressive density-estimator , 2013, NIPS.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[13]  Ruslan Salakhutdinov,et al.  Learning in Markov Random Fields using Tempered Transitions , 2009, NIPS.

[14]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Hugo Larochelle,et al.  An Infinite Restricted Boltzmann Machine , 2015, Neural Computation.

[17]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[18]  Tapani Raiko,et al.  Iterative Neural Autoregressive Distribution Estimator NADE-k , 2014, NIPS.

[19]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[20]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[21]  Tapani Raiko,et al.  Parallel tempering is efficient for learning restricted Boltzmann machines , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[22]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[23]  Ruslan Salakhutdinov,et al.  Accurate and conservative estimates of MRF log-likelihood using reverse annealing , 2014, AISTATS.

[24]  Ilya Sutskever,et al.  Parallelizable Sampling of Markov Random Fields , 2010, AISTATS.

[25]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[26]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.