暂无分享,去创建一个
Shinji Watanabe | Daniel Garcia-Romero | Yuekai Zhang | Tomoki Hayashi | Pengcheng Guo | Chenda Li | Xuankai Chang | Hirofumi Inaguma | Kun Wei | Yosuke Higuchi | Wangyou Zhang | Jing Shi | Florian Boyer | Naoyuki Kamo | Jiatong Shi | D. Garcia-Romero | Shinji Watanabe | Tomoki Hayashi | H. Inaguma | Jiatong Shi | Jing Shi | Chenda Li | Wangyou Zhang | Xuankai Chang | Naoyuki Kamo | Kun Wei | Pengcheng Guo | Yosuke Higuchi | Florian Boyer | Yuekai Zhang
[1] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[2] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[4] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[6] Tomoki Toda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[9] Shinji Watanabe,et al. Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration , 2019, INTERSPEECH.
[10] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[11] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[13] Di He,et al. Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View , 2019, ArXiv.
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[16] Kevin Duh,et al. ESPnet-ST: All-in-One Speech Translation Toolkit , 2020, ACL.
[17] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[18] Shinji Watanabe,et al. End-to-end Speech Recognition With Word-Based Rnn Language Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[19] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[20] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[21] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[22] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[23] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[24] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[25] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[26] Adrian La'ncucki. FastPitch: Parallel Text-to-speech with Pitch Prediction , 2020, ArXiv.
[27] Nicholay Topin,et al. Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.
[28] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[29] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Hermann Ney,et al. A Comparison of Transformer and LSTM Encoder Decoder Models for ASR , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.