暂无分享,去创建一个
Rémi Gribonval | Hervé Jégou | Benjamin Graham | Pierre Stock | R. Gribonval | H. Jégou | Benjamin Graham | Pierre Stock
[1] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[3] Sergey Ioffe,et al. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models , 2017, NIPS.
[4] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[5] Razvan Pascanu,et al. Natural Gradient Revisited , 2013, ICLR.
[6] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[8] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[9] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[10] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[11] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[12] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[13] Nitish Srivastava,et al. Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.
[14] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[17] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[18] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[19] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[20] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[21] Boris Ginsburg,et al. Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification , 2017, ArXiv.
[22] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[23] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[24] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[25] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Aleksander Madry,et al. How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.
[29] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[30] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[31] Razvan Pascanu,et al. On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.
[32] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[33] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[34] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[35] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[37] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[38] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[39] Yann Ollivier,et al. Practical Riemannian Neural Networks , 2016, ArXiv.
[40] Léon Bottou,et al. Diagonal Rescaling For Neural Networks , 2017, ArXiv.
[41] P. Tseng. Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .