Centered Weight Normalization in Accelerating Training of Deep Neural Networks
暂无分享,去创建一个
Lei Huang | Yang Liu | Xianglong Liu | Bo Lang | Dacheng Tao | D. Tao | Lei Huang | B. Lang | Xianglong Liu | Yang Liu
[1] Razvan Pascanu,et al. Natural Neural Networks , 2015, NIPS.
[2] Yoshua Bengio,et al. Maxout Networks , 2013, ICML.
[3] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[4] Guo-Jun Qi,et al. Hierarchically Gated Deep Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] A. Conv. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016 .
[6] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[7] Venu Govindaraju,et al. Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks , 2016, ICML.
[8] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[9] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Ruslan Salakhutdinov,et al. Data-Dependent Path Normalization in Neural Networks , 2015, ICLR.
[11] Yoshua Bengio,et al. Unitary Evolution Recurrent Neural Networks , 2015, ICML.
[12] Victor D. Dorobantu,et al. DizzyRNN: Reparameterizing Recurrent Neural Networks for Norm-Preserving Backpropagation , 2016, ArXiv.
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ruslan Salakhutdinov,et al. Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix , 2015, ICML.
[15] Nicol N. Schraudolph,et al. Accelerated Gradient Descent by Factor-Centering Decomposition , 1998 .
[16] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[17] Ming Shao,et al. Deep Robust Encoder Through Locality Preserving Low-Rank Dictionary , 2016, ECCV.
[18] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[19] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[20] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[21] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.
[22] Frank Nielsen,et al. Relative Natural Gradient for Learning Large Complex Models , 2016, ArXiv.
[23] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .
[24] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[25] Les E. Atlas,et al. Full-Capacity Unitary Recurrent Neural Networks , 2016, NIPS.
[26] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[27] Hermann Ney,et al. Mean-normalized stochastic gradient for large-scale deep learning , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Yann LeCun,et al. Effiicient BackProp , 1996, Neural Networks: Tricks of the Trade.
[29] Jiri Matas,et al. All you need is a good init , 2015, ICLR.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[33] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[34] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[35] Xiaogang Wang,et al. Deep Learning Strong Parts for Pedestrian Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[36] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[37] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[38] Ilya Sutskever,et al. Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.