暂无分享,去创建一个
Lei Huang | Li Liu | Ling Shao | Jie Qin | Fan Zhu | Li Liu | Jie Qin | Fan Zhu | Ling Shao | Lei Huang
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Carla P. Gomes,et al. Understanding Batch Normalization , 2018, NeurIPS.
[3] Elad Hoffer,et al. Norm matters: efficient and accurate normalization schemes in deep networks , 2018, NeurIPS.
[4] Guillaume Hennequin,et al. Exact natural gradient in deep linear networks and its application to the nonlinear case , 2018, NeurIPS.
[5] Lei Huang,et al. Centered Weight Normalization in Accelerating Training of Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[7] Nicol N. Schraudolph,et al. Accelerated Gradient Descent by Factor-Centering Decomposition , 1998 .
[8] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[9] Thomas Hofmann,et al. Towards a Theoretical Understanding of Batch Normalization , 2018, ArXiv.
[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[11] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[12] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.
[13] Ilya Sutskever,et al. Estimating the Hessian by Back-propagating Curvature , 2012, ICML.
[14] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[15] Shankar Krishnan,et al. An Investigation into Neural Net Optimization via Hessian Eigenvalue Density , 2019, ICML.
[16] James Martens,et al. Deep learning via Hessian-free optimization , 2010, ICML.
[17] Frank Nielsen,et al. Relative Fisher Information and Natural Gradient for Learning Large Modular Models , 2017, ICML.
[18] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[19] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[20] Yuan Xie,et al. $L1$ -Norm Batch Normalization for Efficient Training of Deep Neural Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[21] Vardan Papyan,et al. The Full Spectrum of Deep Net Hessians At Scale: Dynamics with Sample Size , 2018, ArXiv.
[22] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[23] Nicolas Le Roux,et al. Topmoumoute Online Natural Gradient Algorithm , 2007, NIPS.
[24] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Klaus-Robert Müller,et al. Deep Boltzmann Machines and the Centering Trick , 2012, Neural Networks: Tricks of the Trade.
[26] Jascha Sohl-Dickstein,et al. A Mean Field Theory of Batch Normalization , 2019, ICLR.
[27] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[28] Lei Huang,et al. Decorrelated Batch Normalization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[29] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[30] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[33] Roger B. Grosse,et al. Distributed Second-Order Optimization using Kronecker-Factored Approximations , 2016, ICLR.
[34] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[35] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[36] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Yann LeCun,et al. Second Order Properties of Error Surfaces: Learning Time and Generalization , 1990, NIPS 1990.
[38] Charles R. Johnson,et al. Topics in Matrix Analysis , 1991 .
[39] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[40] Hermann Ney,et al. A Convergence Analysis of Log-Linear Training , 2011, NIPS.
[41] Michael Möller,et al. Proximal Backpropagation , 2017, ICLR.