The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[3] Nikos Komodakis,et al. DiracNets: Training Very Deep Neural Networks Without Skip-Connections , 2017, ArXiv.
[4] Harish S. Bhat,et al. Predicting Adolescent Suicide Attempts with Neural Networks , 2017, ArXiv.
[5] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Eldad Haber,et al. Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.
[7] Qiang Ye,et al. Orthogonal Recurrent Neural Networks with Scaled Cayley Transform , 2017, ICML.
[8] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[9] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[10] Jiajun Zhang,et al. Deformable deep convolutional generative adversarial network in microwave based hand gesture recognition system , 2017, 2017 9th International Conference on Wireless Communications and Signal Processing (WCSP).
[11] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.
[14] Shuchang Zhou,et al. Learning to Run with Actor-Critic Ensemble , 2017, ArXiv.
[15] R. Zemel,et al. On the Representational Efficiency of Restricted Boltzmann Machines , 2013, NIPS 2013.
[16] Surya Ganguli,et al. Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice , 2017, NIPS.
[17] Junmo Kim,et al. Deep Pyramidal Residual Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[19] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[20] Yoshua Bengio,et al. Residual Connections Encourage Iterative Inference , 2017, ICLR.
[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[22] Jeffrey Pennington,et al. Nonlinear random matrix theory for deep learning , 2019, NIPS.
[23] T. Poggio,et al. Deep vs. shallow networks : An approximation theory perspective , 2016, ArXiv.
[24] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[25] Frederick Tung,et al. Multi-level Residual Networks from Dynamical Systems View , 2017, ICLR.
[26] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[27] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[28] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[29] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[30] Yoshua Bengio,et al. Unitary Evolution Recurrent Neural Networks , 2015, ICML.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[33] Eldad Haber,et al. Stable architectures for deep neural networks , 2017, ArXiv.
[34] Lars Ruthotto,et al. Learning Across Scales - Multiscale Methods for Convolution Neural Networks , 2018, AAAI.
[35] Jianfeng Zhan,et al. Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks , 2017, ICANN.
[36] Matus Telgarsky,et al. Representation Benefits of Deep Feedforward Networks , 2015, ArXiv.
[37] Samuel S. Schoenholz,et al. Deep Mean Field Theory: Layerwise Variance and Width Variation as Methods to Control Gradient Explosion , 2018, ICLR 2018.
[38] Misha Denil,et al. Noisy Activation Functions , 2016, ICML.
[39] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[40] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[41] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[42] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.
[43] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[44] Cesare Furlanello,et al. Convolutional neural networks for structured omics: OmicsCNN and the OmicsConv layer , 2017, 1710.05918.
[45] Shuang Wu,et al. Training and Inference with Integers in Deep Neural Networks , 2018, ICLR.
[46] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[47] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[48] Brian McWilliams,et al. The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.
[49] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[50] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[51] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[52] Ping Luo,et al. Learning Deep Architectures via Generalized Whitened Neural Networks , 2017, ICML.
[53] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[54] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[55] Franco Scarselli,et al. On the complexity of shallow and deep neural network classifiers , 2014, ESANN.
[56] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[57] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[58] Yan Wang,et al. A Powerful Generative Model Using Random Weights for the Deep Image Representation , 2016, NIPS.
[59] Zhenghao Chen,et al. On Random Weights and Unsupervised Feature Learning , 2011, ICML.
[60] Mohammad Malekzadeh,et al. Replacement AutoEncoder: A Privacy-Preserving Algorithm for Sensory Data Analysis , 2017, 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI).