A Probabilistic Framework for Deep Learning

We develop a probabilistic framework for deep learning based on the Deep Rendering Mixture Model (DRMM), a new generative probabilistic model that explicitly capture variations in data due to latent task nuisance variables. We demonstrate that max-sum inference in the DRMM yields an algorithm that exactly reproduces the operations in deep convolutional neural networks (DCNs), providing a first principles derivation. Our framework provides new insights into the successes and shortcomings of DCNs as well as a principled route to their improvement. DRMM training via the Expectation-Maximization (EM) algorithm is a powerful alternative to DCN back-propagation, and initial training results are promising. Classification based on the DRMM and other variants outperforms DCNs in supervised digit classification, training 2-3 x faster while achieving similar accuracy. Moreover, the DRMM is applicable to semi-supervised and unsupervised learning tasks, achieving results that are state-of-the-art in several categories on the MNIST benchmark and comparable to state of the art on the CIFAR10 benchmark.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[9]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[10]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Geoffrey E. Hinton,et al.  Deep Mixtures of Factor Analysers , 2012, ICML.

[13]  Stéphane Mallat,et al.  Invariant Scattering Convolution Networks , 2012, IEEE transactions on pattern analysis and machine intelligence.

[14]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[15]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[16]  Benjamin Schrauwen,et al.  Factoring Variations in Natural Images with Deep Gaussian Mixture Models , 2014, NIPS.

[17]  Aditya Bhaskara,et al.  Provable Bounds for Learning Some Deep Representations , 2013, ICML.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[20]  Razvan Pascanu,et al.  On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.

[21]  Stefano Soatto,et al.  Visual Representations: Defining Properties and Deep Approximations , 2014, ICLR 2016.

[22]  Jörg Lücke,et al.  A truncated EM approach for spike-and-slab sparse coding , 2012, J. Mach. Learn. Res..

[23]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[24]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[25]  Brendan J. Frey,et al.  Winner-Take-All Autoencoders , 2014, NIPS.

[26]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[27]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[28]  Yann LeCun,et al.  Stacked What-Where Auto-encoders , 2015, ArXiv.

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Zheng Xu,et al.  Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.

[31]  Richard G. Baraniuk,et al.  Semi-Supervised Learning with the Deep Rendering Mixture Model , 2016, ArXiv.

[32]  Jost Tobias Springenberg,et al.  Unsupervised and Semi-supervised Learning with Categorical Generative Adversarial Networks , 2015, ICLR.

[33]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[34]  Ole Winther,et al.  Auxiliary Deep Generative Models , 2016, ICML.