On the expected behaviour of noise regularised deep neural networks as Gaussian processes

Recent work has established the equivalence between deep neural networks and Gaussian processes (GPs), resulting in so-called neural network Gaussian processes (NNGPs). The behaviour of these models depends on the initialisation of the corresponding network. In this work, we consider the impact of noise regularisation (e.g. dropout) on NNGPs, and relate their behaviour to signal propagation theory in noise regularised deep neural networks. For ReLU activations, we find that the best performing NNGPs have kernel parameters that correspond to a recently proposed initialisation scheme for noise regularised ReLU networks. In addition, we show how the noise influences the covariance matrix of the NNGP, producing a stronger prior towards simple functions away from the training points. We verify our theoretical findings with experiments on MNIST and CIFAR-10 as well as on synthetic data.

[1]  Tommi S. Jaakkola,et al.  Steps Toward Deep Kernel Methods from Infinite Neural Networks , 2015, ArXiv.

[2]  Yoram Singer,et al.  Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.

[3]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[4]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[5]  Daniel Hernández-Lobato,et al.  Deep Gaussian Processes for Regression using Approximate Expectation Propagation , 2016, ICML.

[6]  Neil D. Lawrence,et al.  Variational Auto-encoded Deep Gaussian Processes , 2015, ICLR.

[7]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[8]  Radford M. Neal Priors for Infinite Networks , 1996 .

[9]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[10]  Laurence Aitchison,et al.  Deep Convolutional Networks as shallow Gaussian Processes , 2018, ICLR.

[11]  Richard E. Turner,et al.  Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.

[12]  Samuel S. Schoenholz,et al.  Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.

[13]  Surya Ganguli,et al.  Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.

[14]  Benjamin Rosman,et al.  If dropout limits trainable depth, does critical initialisation still matter? A large-scale statistical analysis on ReLU networks , 2019, Pattern Recognit. Lett..

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Neil D. Lawrence,et al.  Deep Gaussian Processes , 2012, AISTATS.

[17]  Samuel S. Schoenholz,et al.  Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks , 2018, ICML.

[18]  Steve Kroon,et al.  Critical initialisation for deep signal propagation in noisy rectifier neural networks , 2018, NeurIPS.

[19]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Neil D. Lawrence,et al.  Nested Variational Compression in Deep Gaussian Processes , 2014, 1412.1370.

[22]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[23]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[24]  Andrew Gordon Wilson,et al.  Learning Scalable Deep Kernels with Recurrent Structure , 2016, J. Mach. Learn. Res..

[25]  Jaehoon Lee,et al.  Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes , 2018, ArXiv.

[26]  Ryan P. Adams,et al.  Avoiding pathologies in very deep networks , 2014, AISTATS.

[27]  Nicolas Le Roux,et al.  Continuous Neural Networks , 2007, AISTATS.

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  Surya Ganguli,et al.  Deep Information Propagation , 2016, ICLR.

[30]  Jaehoon Lee,et al.  Deep Neural Networks as Gaussian Processes , 2017, ICLR.

[31]  Marc Peter Deisenroth,et al.  Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.

[32]  Samuel S. Schoenholz,et al.  Disentangling trainability and generalization in deep learning , 2019, ArXiv.

[33]  Christopher K. I. Williams Computing with Infinite Networks , 1996, NIPS.

[34]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[35]  Jascha Sohl-Dickstein,et al.  Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.

[36]  Surya Ganguli,et al.  Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.