A Quantitative Analysis of the Effect of Batch Normalization on Gradient Descent
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[2] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[3] S. Ponomarev. Submersions and preimages of sets of measure zero , 1987 .
[4] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .
[5] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.
[6] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[7] Georgios Piliouras,et al. Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions , 2016, ITCS.
[8] M. Shub. Global Stability of Dynamical Systems , 1986 .
[9] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[10] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[11] Sanjeev Arora,et al. Theoretical Analysis of Auto Rate-Tuning by Batch Normalization , 2018, ICLR.
[12] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[13] Diego Klabjan,et al. Convergence Analysis of Batch Normalization for Deep Neural Nets , 2017, ArXiv.
[14] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[15] Yair Carmon,et al. Lower bounds for finding stationary points I , 2017, Mathematical Programming.
[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[17] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.
[18] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[19] Aaron C. Courville,et al. Recurrent Batch Normalization , 2016, ICLR.
[20] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[21] Thomas Hofmann,et al. Towards a Theoretical Understanding of Batch Normalization , 2018, ArXiv.
[22] Harold R. Parks,et al. A Primer of Real Analytic Functions , 1992 .