How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers
暂无分享,去创建一个
[1] Hermann Ney,et al. Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Hermann Ney,et al. RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation , 2019, INTERSPEECH.
[3] Hermann Ney,et al. Language Modeling with Deep Transformers , 2019, INTERSPEECH.
[4] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[5] Hermann Ney,et al. On the Estimation of Discount Parameters for Language Model Smoothing , 2011, INTERSPEECH.
[6] Thorsten Brants,et al. Study on interaction between entropy pruning and kneser-ney smoothing , 2010, INTERSPEECH.
[7] Hermann Ney,et al. RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition , 2018, ACL.
[8] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[9] Guillaume Lample,et al. Large Memory Layers with Product Keys , 2019, NeurIPS.
[10] Shiliang Zhang,et al. The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models , 2015, ACL.
[11] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[12] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[13] Hermann Ney,et al. Lattice decoding and rescoring with long-Span neural network language models , 2014, INTERSPEECH.
[14] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[15] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[16] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[17] Hermann Ney,et al. Training Language Models for Long-Span Cross-Sentence Evaluation , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[19] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[20] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[21] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[22] Hermann Ney,et al. The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Hermann Ney,et al. Domain Robust, Fast, and Compact Neural Language Models , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Hermann Ney,et al. Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.
[26] Tobias Domhan,et al. How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures , 2018, ACL.
[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[28] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[29] Shankar Kumar,et al. Lattice rescoring strategies for long short term memory language models in speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[31] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[32] Aapo Hyvärinen,et al. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.
[33] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[34] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[35] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[36] Hermann Ney,et al. Bag-of-words input for long history representation in neural network-based language models for speech recognition , 2015, INTERSPEECH.
[37] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[38] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[39] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[40] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.