Phoneme-to-Grapheme Conversion Based Large-Scale Pre-Training for End-to-End Automatic Speech Recognition
暂无分享,去创建一个
Akihiko Takashima | Ryo Masumura | Tomohiro Tanaka | Mana Ihori | Shota Orihashi | Naoki Makishima | Ryo Masumura | Tomohiro Tanaka | Naoki Makishima | Shota Orihashi | Mana Ihori | Akihiko Takashima
[1] Tatsuya Kawahara,et al. Leveraging Sequence-to-Sequence Speech Synthesis for Enhancing Acoustic-to-Word Speech Recognition , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[2] Karen Livescu,et al. Unsupervised Pre-Training of Bidirectional Speech Encoders via Masked Reconstruction , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Steve Renals,et al. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition , 2015, INTERSPEECH.
[4] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[6] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[8] Akihiko Takashima,et al. Sequence-Level Consistency Training for Semi-Supervised End-to-End Automatic Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[10] Tomoharu Iwata,et al. Semi-Supervised End-to-End Speech Recognition , 2018, INTERSPEECH.
[11] Shinji Watanabe,et al. Multi-Modal Data Augmentation for End-to-end ASR , 2018, INTERSPEECH.
[12] Takanobu Oba,et al. Large Context End-to-end Automatic Speech Recognition via Extension of Hierarchical Recurrent Encoder-decoder Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.
[14] Wei Zhao,et al. Denoising based Sequence-to-Sequence Pre-training for Text Generation , 2019, EMNLP.
[15] Hermann Ney,et al. Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[19] Bhuvana Ramabhadran,et al. Direct Acoustics-to-Word Models for English Conversational Speech Recognition , 2017, INTERSPEECH.
[20] Awni Hannun,et al. Self-Training for End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[22] Navdeep Jaitly,et al. Towards Better Decoding and Language Model Integration in Sequence to Sequence Models , 2016, INTERSPEECH.
[23] Yusuke Ijima,et al. End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders , 2019, INTERSPEECH.
[24] Ramón Fernández Astudillo,et al. Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text , 2019, INTERSPEECH.
[25] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[26] Hitoshi Isahara,et al. Spontaneous Speech Corpus of Japanese , 2000, LREC.
[27] Yuzong Liu,et al. Deep Contextualized Acoustic Representations for Semi-Supervised Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Jonathan Le Roux,et al. Cycle-consistency Training for End-to-end Speech Recognition , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).