The Principle of Logit Separation
暂无分享,去创建一个
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Dan Klein,et al. When and why are log-linear models self-normalizing? , 2015, NAACL.
[3] Geoffrey E. Hinton,et al. A Scalable Hierarchical Distributed Language Model , 2008, NIPS.
[4] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[5] Wei Wang,et al. Multi-task deep neural network for multi-label learning , 2013, 2013 IEEE International Conference on Image Processing.
[6] Yichuan Tang,et al. Deep Learning using Support Vector Machines , 2013, ArXiv.
[7] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.
[8] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[9] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[11] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.
[12] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[13] Jason Weston,et al. Label Partitioning For Sublinear Ranking , 2013, ICML.
[14] Wenlin Chen,et al. Strategies for Training Large Vocabulary Neural Language Models , 2015, ACL.
[15] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[16] Yoshua Bengio,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003, AISTATS.
[17] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..
[18] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[19] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[20] Geoffrey E. Hinton. Connectionist Learning Procedures , 1989, Artif. Intell..
[21] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.