暂无分享,去创建一个
Keqi Deng | Long Ma | Songjun Cao | Yike Zhang | Keqi Deng | Yike Zhang | Songjun Cao | Long Ma
[1] Kuan-Yu Chen,et al. Non-autoregressive Transformer-based End-to-end ASR using BERT , 2021, ArXiv.
[2] Alexei Baevski,et al. vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations , 2019, ICLR.
[3] Keqi Deng,et al. Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-supervised Learning , 2021, Interspeech.
[4] Gabriel Synnaeve,et al. Self-Training and Pre-Training are Complementary for Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Yonghong Yan,et al. Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[7] Ning Cheng,et al. Applying wav2vec2.0 to Speech Recognition in various low-resource languages , 2020, ArXiv.
[8] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[9] Tara N. Sainath,et al. An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.
[11] Yonghong Yan,et al. History Utterance Embedding Transformer LM for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Shiyu Zhou,et al. Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition , 2021, IEEE Signal Processing Letters.
[13] Tatsuya Kawahara,et al. Distilling the Knowledge of BERT for Sequence-to-Sequence ASR , 2020, INTERSPEECH.
[14] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[15] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[16] Gabriel Synnaeve,et al. Joint Masked CPC and CTC Training for ASR , 2020, ArXiv.
[17] Brian Kingsbury,et al. Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[19] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[20] Quoc V. Le,et al. Unsupervised Pretraining for Sequence to Sequence Learning , 2016, EMNLP.
[21] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[22] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[23] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[24] Jianhua Tao,et al. Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[25] Xin Zhao,et al. UER: An Open-Source Toolkit for Pre-training Models , 2019, EMNLP.
[26] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Alexei Baevski,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[28] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[29] John R. Hershey,et al. Joint CTC/attention decoding for end-to-end speech recognition , 2017, ACL.
[30] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[31] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[32] Adam Coates,et al. Cold Fusion: Training Seq2Seq Models Together with Language Models , 2017, INTERSPEECH.
[33] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[34] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[35] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[36] Wen Gao,et al. Semantics of the Unwritten , 2020, ArXiv.
[37] Yan Li,et al. The Speechtransformer for Large-scale Mandarin Chinese Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Christian Igel,et al. On Scaling Contrastive Representations for Low-Resource Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Sining Sun,et al. Improving Streaming Transformer Based ASR Under a Framework of Self-supervised Learning , 2021, Interspeech.
[41] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .