RNADE: The real-valued neural autoregressive density-estimator

We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of one-dimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradient-based optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case.

[1]  T. Cacoullos Estimation of a multivariate density , 1966 .

[2]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[3]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[4]  T. Robinson Simple Lossless and Near-lossless Waveform Compression , 1994 .

[5]  S. Srihari Mixture Density Networks , 1994 .

[6]  Brendan J. Frey,et al.  Does the Wake-sleep Algorithm Produce Good Density Estimators? , 1995, NIPS.

[7]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[8]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[9]  Samy Bengio,et al.  Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks , 1999, NIPS.

[10]  Nir Friedman,et al.  Gaussian Process Networks , 2000, UAI.

[11]  Geoffrey J. McLachlan,et al.  Mixtures of Factor Analyzers , 2000, International Conference on Machine Learning.

[12]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[14]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[15]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[16]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[17]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Hugo Larochelle,et al.  The Neural Autoregressive Distribution Estimator , 2011, AISTATS.

[19]  Yair Weiss,et al.  From learning models of natural image patches to whole image restoration , 2011, 2011 International Conference on Computer Vision.

[20]  Yoshua Bengio Discussion of "The Neural Autoregressive Distribution Estimator" , 2011, AISTATS.

[21]  Yee Whye Teh,et al.  Mixed Cumulative Distribution Networks , 2010, AISTATS.

[22]  Yoshua Bengio,et al.  A Spike and Slab Restricted Boltzmann Machine , 2011, AISTATS.

[23]  Matthias Bethge,et al.  In All Likelihood, Deep Belief Is Not Enough , 2010, J. Mach. Learn. Res..

[24]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[25]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[26]  Yair Weiss,et al.  "Natural Images, Gaussian Mixtures and Dead Leaves" , 2012, NIPS.

[27]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[28]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[29]  M. Bethge,et al.  Mixtures of Conditional Gaussian Scale Mixtures Applied to Multiscale Image Representations , 2011, PloS one.

[30]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[31]  Geoffrey E. Hinton,et al.  Deep Mixtures of Factor Analysers , 2012, ICML.

[32]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.