Sigsoftmax: Reanalysis of the Softmax Bottleneck
暂无分享,去创建一个
Yasuhiro Fujiwara | Shuichi Adachi | Sekitoshi Kanai | Yuki Yamanaka | Y. Fujiwara | S. Adachi | Sekitoshi Kanai | Yuki Yamanaka
[1] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[2] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[3] Vysoké Učení,et al. Statistical Language Models Based on Neural Networks , 2012 .
[4] Ramón Fernández Astudillo,et al. From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.
[5] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[6] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .
[7] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[8] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[9] Michalis K. Titsias,et al. One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities , 2016, NIPS.
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[12] John S. Bridle,et al. Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.
[13] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[14] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[15] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[16] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[18] Payman Mohassel,et al. SecureML: A System for Scalable Privacy-Preserving Machine Learning , 2017, 2017 IEEE Symposium on Security and Privacy (SP).
[19] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[20] Geoffrey E. Hinton,et al. Gated Softmax Classification , 2010, NIPS.
[21] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[22] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[23] Pascal Vincent,et al. An Exploration of Softmax Alternatives Belonging to the Spherical Loss Family , 2015, ICLR.
[24] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[25] Junping Du,et al. Noisy Softmax: Improving the Generalization Ability of DCNN via Postponing the Early Softmax Saturation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[27] Yann Ollivier,et al. Riemannian metrics for neural networks I: feedforward networks , 2013, 1303.0818.
[28] Minjae Lee,et al. SVD-Softmax: Fast Softmax Approximation on Large Vocabulary Neural Networks , 2017, NIPS.
[29] Steve Renals,et al. Dynamic Evaluation of Neural Sequence Models , 2017, ICML.
[30] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Yasuhiro Fujiwara,et al. Preventing Gradient Explosions in Gated Recurrent Units , 2017, NIPS.
[33] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.