暂无分享,去创建一个
Minseok Kwon | Taewoo Lee | Jihyun Lee | Tae Gyoon Kang | Ho-Gyeong Kim | Kyoung-Gu Woo | Young Sang Choi | Min-Joong Lee | Seokyeong Jung | Yeona Hong | Jungin Lee | Jiseung Jeong | Hosik Lee
[1] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[2] Iain Murray,et al. BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning , 2019, ICML.
[3] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[4] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[5] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[6] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[7] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[8] Ho-Gyeong Kim,et al. Knowledge Distillation Using Output Errors for Self-attention End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Nam Soo Kim,et al. Adaptive Knowledge Distillation Based on Entropy , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.
[13] Samy Bengio,et al. Tensor2Tensor for Neural Machine Translation , 2018, AMTA.
[14] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[15] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Andrea Vedaldi,et al. Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[18] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Tara N. Sainath,et al. Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model , 2019, INTERSPEECH.