Neural Rendering Model: Joint Generation and Prediction for Semi-Supervised Learning

Unsupervised and semi-supervised learning are important problems that are especially challenging with complex data like natural images. Progress on these problems would accelerate if we had access to appropriate generative models under which to pose the associated inference tasks. Inspired by the success of Convolutional Neural Networks (CNNs) for supervised prediction in images, we design the Neural Rendering Model (NRM), a new probabilistic generative model whose inference calculations correspond to those in a given CNN architecture. The NRM uses the given CNN to design the prior distribution in the probabilistic model. Furthermore, the NRM generates images from coarse to finer scales. It introduces a small set of latent variables at each level, and enforces dependencies among all the latent variables via a conjugate prior distribution. This conjugate prior yields a new regularizer based on paths rendered in the generative model for training CNNs-the Rendering Path Normalization (RPN). We demonstrate that this regularizer improves generalization, both in theory and in practice. In addition, likelihood estimation in the NRM yields training losses for CNNs, and inspired by this, we design a new loss termed as the Max-Min cross entropy which outperforms the traditional cross-entropy loss for object classification. The Max-Min cross entropy suggests a new deep network architecture, namely the Max-Min network, which can learn from less labeled data while maintaining good prediction performance. Our experiments demonstrate that the NRM with the RPN and the Max-Min architecture exceeds or matches the-state-of-art on benchmarks including SVHN, CIFAR10, and CIFAR100 for semi-supervised and supervised learning tasks.

[1]  Michael Elad,et al.  Working Locally Thinking Globally: Theoretical Guarantees for Convolutional Sparse Coding , 2017, IEEE Transactions on Signal Processing.

[2]  Zachary Chase Lipton,et al.  Born Again Neural Networks , 2018, ICML.

[3]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[4]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[5]  Karl J. Friston,et al.  Does predictive coding have a future? , 2018, Nature Neuroscience.

[6]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[7]  Richard G. Baraniuk,et al.  A Spline Theory of Deep Networks , 2018, ICML.

[8]  H. Robbins A Stochastic Approximation Method , 1951 .

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Joan Bruna,et al.  Mathematics of Deep Learning , 2017, ArXiv.

[11]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[12]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[13]  Shin Ishii,et al.  Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[15]  Richard G. Baraniuk,et al.  A Probabilistic Theory of Deep Learning , 2015, ArXiv.

[16]  Abhishek Kumar,et al.  Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference , 2017, NIPS.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[21]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[22]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[24]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[26]  Brendt Wohlberg,et al.  Efficient convolutional sparse coding , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[28]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[29]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[30]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[31]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[33]  Anders P. Eriksson,et al.  Fast Convolutional Sparse Coding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[35]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[36]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[37]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[38]  Gordon Wetzstein,et al.  Fast and flexible convolutional sparse coding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  S. Geer Empirical Processes in M-Estimation , 2000 .

[40]  Richard G. Baraniuk,et al.  Semi-Supervised Learning with the Deep Rendering Mixture Model , 2016, ArXiv.

[41]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[42]  Nathan Srebro,et al.  Exploring Generalization in Deep Learning , 2017, NIPS.

[43]  A. P. Dawid,et al.  Generative or Discriminative? Getting the Best of Both Worlds , 2007 .

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[46]  Joan Bruna,et al.  Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[47]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[48]  Stéphane Mallat,et al.  Understanding deep convolutional networks , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[49]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[50]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[51]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Ohad Shamir,et al.  Size-Independent Sample Complexity of Neural Networks , 2017, COLT.

[53]  Leslie Pack Kaelbling,et al.  Generalization in Deep Learning , 2017, ArXiv.

[54]  Michael Elad,et al.  Convolutional Neural Networks Analyzed via Convolutional Sparse Coding , 2016, J. Mach. Learn. Res..

[55]  Xavier Gastaldi,et al.  Shake-Shake regularization of 3-branch residual networks , 2017, ICLR.

[56]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[57]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[58]  S. Mallat,et al.  Invariant Scattering Convolution Networks , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[60]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[61]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[62]  Richard G. Baraniuk,et al.  A Probabilistic Framework for Deep Learning , 2016, NIPS.