暂无分享,去创建一个
[1] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[2] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Persi Diaconis,et al. Iterated Random Functions , 1999, SIAM Rev..
[4] Yoram Singer,et al. Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity , 2016, NIPS.
[5] David Steinsaltz,et al. Locally Contractive Iterated Function Systems , 1999 .
[6] Fei Wang,et al. Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..
[7] Steve Kroon,et al. Critical initialisation for deep signal propagation in noisy rectifier neural networks , 2018, NeurIPS.
[8] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[9] Surya Ganguli,et al. Analyzing noise in autoencoders and deep networks , 2014, ArXiv.
[10] Robert C. Qiu,et al. Spectrum Concentration in Deep Residual Learning: A Free Probability Approach , 2018, IEEE Access.
[11] Praneeth Netrapalli,et al. Non-Gaussianity of Stochastic Gradient Noise , 2019, ArXiv.
[12] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[13] Yann Dauphin,et al. MetaInit: Initializing learning by learning to initialize , 2019, NeurIPS.
[14] Noel A Cressie,et al. The moment generating function has its moments , 1986 .
[15] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[16] M. Nica,et al. Products of Many Large Random Matrices and Gradients in Deep Neural Networks , 2018, Communications in Mathematical Physics.
[17] Samuel S. Schoenholz,et al. Mean Field Residual Networks: On the Edge of Chaos , 2017, NIPS.
[18] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[19] Charles M. Newman,et al. The Stability of Large Random Matrices and Their Products , 1984 .
[20] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[21] Andreas Veit,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[22] Aaron Defazio,et al. Scaling Laws for the Principled Design, Initialization and Preconditioning of ReLU Networks , 2019, ArXiv.
[23] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[24] J. Elton. A multiplicative ergodic theorem for lipschitz maps , 1990 .
[25] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[26] A. Laforgia,et al. On the asymptotic expansion of a ratio of gamma functions , 2012 .
[27] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[28] Lu Lu,et al. Dying ReLU and Initialization: Theory and Numerical Examples , 2019, Communications in Computational Physics.
[29] M. G. Bulmer,et al. Principles of Statistics. , 1969 .
[30] C. Walck. Hand-book on statistical distributions for experimentalists , 1996 .
[31] Josef Hadar,et al. Rules for Ordering Uncertain Prospects , 1969 .
[32] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[33] Fernando A. Mujica,et al. An Empirical Evaluation of Deep Learning on Highway Driving , 2015, ArXiv.
[34] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[35] Jascha Sohl-Dickstein,et al. Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks , 2018, ICML.
[36] B. L. Kalman,et al. Why tanh: choosing a sigmoidal function , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[37] David Sussillo,et al. Random Walks: Training Very Deep Nonlinear Feed-Forward Networks with Smart Initialization , 2014, ArXiv.
[38] Gaël Richard,et al. On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks , 2019, ArXiv.
[39] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[40] S. Foss,et al. An Introduction to Heavy-Tailed and Subexponential Distributions , 2011 .
[41] H. Sebastian Seung,et al. Variance-Preserving Initialization Schemes Improve Deep Network Training: But Which Variance is Preserved? , 2019, ArXiv.
[42] Daniel Soudry,et al. A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off , 2019, NeurIPS.
[43] Kyle L. Luther,et al. Sample Variance Decay in Randomly Initialized ReLU Networks , 2019 .
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .
[46] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[47] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[48] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[49] Boris Hanin,et al. Which Neural Net Architectures Give Rise To Exploding and Vanishing Gradients? , 2018, NeurIPS.
[50] S. Resnick. Heavy-Tail Phenomena: Probabilistic and Statistical Modeling , 2006 .
[51] Jan Hendrik Witte,et al. Deep Learning for Finance: Deep Portfolios , 2016 .
[52] Milan Merkle,et al. Logarithmic convexity and inequalities for the gamma function , 1996 .
[53] L. Arnold,et al. Lyapunov exponents of linear stochastic systems , 1986 .
[54] Gill A. Pratt,et al. Is a cambrian explosion coming for robotics , 2015 .
[55] Sebastian Mentemeier,et al. On multidimensional Mandelbrot cascades , 2014 .
[56] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[57] Michael W. Mahoney,et al. Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.
[58] Arnaud Doucet,et al. On the Selection of Initialization and Activation Function for Deep Neural Networks , 2018, ArXiv.