Large Scale Taxonomy Classification using BiLSTM with Self-Attention
暂无分享,去创建一个
[1] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[2] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[3] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[4] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[5] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[6] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[8] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[9] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[10] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.