End-to-End ASR with Adaptive Span Self-Attention
暂无分享,去创建一个
Shinji Watanabe | Pengcheng Guo | Xuankai Chang | Motoi Omachi | Aswin Shanmugam Subramanian | Yuya Fujita | Shinji Watanabe | A. Subramanian | Xuankai Chang | Pengcheng Guo | Motoi Omachi | Yuya Fujita
[1] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[2] Yonghong Yan,et al. Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[5] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[6] Shinji Watanabe,et al. Transformer ASR with Contextual Block Processing , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[7] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.
[8] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[9] Jonathan Le Roux,et al. A Purely End-to-End System for Multi-speaker Speech Recognition , 2018, ACL.
[10] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[11] Jonathan Le Roux,et al. End-To-End Multi-Speaker Speech Recognition With Transformer , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[13] Jonathan Le Roux,et al. Streaming Automatic Speech Recognition with the Transformer Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[15] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[16] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[17] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Satoshi Nakamura,et al. Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing , 2017, IJCNLP.
[21] Tong Zhang,et al. Modeling Localness for Self-Attention Networks , 2018, EMNLP.
[22] Jungwon Lee,et al. T-GSA: Transformer with Gaussian-Weighted Self-Attention for Speech Enhancement , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Matthias Sperber,et al. Self-Attentional Acoustic Models , 2018, INTERSPEECH.
[24] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[25] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Shinji Watanabe,et al. Towards Online End-to-end Transformer Automatic Speech Recognition , 2019, ArXiv.
[29] Peter Bell,et al. Windowed Attention Mechanisms for Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[31] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[33] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[35] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[36] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).