The Latent-Dependent Deep Rendering Model

Generative models that can capture latent variations in data are difficult to design in complex domains such as natural images where there are large numbers of nuisance variables. Given the success of the Convolutional Neural Networks (CNNs) on inference tasks in such domains, we aim to design generative models whose inference corresponds to the CNNs. One such class is the Deep Rendering Model (DRM). The DRM generates images via multiple levels of abstraction, from coarse to fine scale, and introduces a small set of latent variables at each level. However, a number of simplifying assumptions are made in the DRM to derive the CNNs as the bottomup inference algorithm which do not correspond to realistic variation in natural images. For instance, the latent variables at different scales of the DRM are assumed to be independent. We propose an extension to the DRM, termed the LatentDependent Deep Rendering Model (LD-DRM), that enforces dependencies among the latent variables via a parametrized joint prior distribution. This joint prior yields a new form of regularization for training the CNNs—the Rendering Path Normalization (RPN). Under the LD-DRM, we obtain consistent estimators for unsupervised/semisupervised learning tasks and derive generalization bounds. Our bound suggests that the RPN regularization helps improve generalization, an observation that is corroborated in practice. In our experiments, the LD-DRM either beats or matches the state of art on popular benchmarks including SVHN, CIFAR10, and CIFAR100 on both semi-supervised and supervised learning tasks. Equal contribution as first authors , Equal contribution as last authors University of California at Berkeley, Berkeley, USA Rice University, Houston, USA Amazon AI, Palo Alto, USA Baylor College of Medicine, Houston, USA California Institute of Technology, Pasadena, USA. Correspondence to: Nhat Ho <minhnhat@berkeley.edu>, Tan Nguyen <mn15@rice.edu>. Presented at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models

[1]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[3]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[4]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[6]  Michael Elad,et al.  Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding , 2017, IEEE Transactions on Signal Processing.

[7]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[8]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[9]  Stefano Soatto,et al.  Emergence of invariance and disentangling in deep representations , 2017 .

[10]  Abhishek Kumar,et al.  Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference , 2017, NIPS.

[11]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[12]  Joan Bruna,et al.  Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[13]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[14]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[15]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.

[16]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[17]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[18]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[19]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[20]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[21]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[22]  Gordon Wetzstein,et al.  Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[24]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[25]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  Brendt Wohlberg,et al.  Efficient convolutional sparse coding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Anders P. Eriksson,et al.  Fast Convolutional Sparse Coding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[32]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[34]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[35]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[36]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[37]  S. Geer Empirical Processes in M-Estimation , 2000 .

[38]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[39]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .