Adam Induces Implicit Weight Sparsity in Rectifier Neural Networks
暂无分享,去创建一个
Taiji Suzuki | Atsushi Yaguchi | Akiyuki Tanizawa | Yukinobu Sakata | Shuhei Nitta | Wataru Asano | Taiji Suzuki | A. Yaguchi | Shuhei Nitta | Y. Sakata | Wataru Asano | A. Tanizawa
[1] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[2] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[3] Danilo Comminiello,et al. Group sparse regularization for deep neural networks , 2016, Neurocomputing.
[4] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[6] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[7] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[8] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[10] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.
[11] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[12] Lior Wolf,et al. Channel-Level Acceleration of Deep Face Representations , 2015, IEEE Access.
[13] Claus Nebauer,et al. Evaluation of convolutional neural networks for visual recognition , 1998, IEEE Trans. Neural Networks.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[17] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[18] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[19] Sung Ju Hwang,et al. Combined Group and Exclusive Sparsity for Deep Neural Networks , 2017, ICML.
[20] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.
[21] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[22] Shaohuai Shi,et al. Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units , 2017, ArXiv.
[23] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[24] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.
[25] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[27] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[28] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.
[29] Jian Yang,et al. Sparseness Analysis in the Pretraining of Deep Neural Networks , 2017, IEEE Transactions on Neural Networks and Learning Systems.