MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition
暂无分享,去创建一个
[1] Kyu J. Han,et al. Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages , 2022, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Michael Auli,et al. Unified Speech-Text Pre-training for Speech Translation and Recognition , 2022, ACL.
[3] Jinyu Li,et al. Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data , 2022, INTERSPEECH.
[4] Michael Auli,et al. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language , 2022, ICML.
[5] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[6] Ankur Bapna,et al. mSLAM: Massively multilingual joint pre-training for speech and text , 2022, ArXiv.
[7] Jinyu Li,et al. WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing , 2021, IEEE Journal of Selected Topics in Signal Processing.
[8] Rui Wang,et al. SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing , 2021, ACL.
[9] Lei Xie,et al. WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Ankur Bapna,et al. SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training , 2021, ArXiv.
[11] Dmitriy Genzel,et al. Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task , 2021, ACL.
[12] Ruslan Salakhutdinov,et al. HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[13] Heng Ji,et al. Learning Shared Semantic Space for Speech-to-Text Translation , 2021, FINDINGS.
[14] Xianyan Jia,et al. M6: A Chinese Multimodal Pretrainer , 2021, ArXiv.
[15] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[16] Dmitriy Genzel,et al. A General Multi-Task Learning Framework to Leverage Text Data for Speech to Text Tasks , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] J. Pino,et al. Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq , 2020, AACL.
[18] Abdel-rahman Mohamed,et al. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.
[19] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[20] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[21] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[22] Luke S. Zettlemoyer,et al. Transformers with convolutional context for ASR , 2019, ArXiv.
[23] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[24] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[25] Hui Bu,et al. AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.
[26] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[27] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Sanjeev Khudanpur,et al. A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[31] Wei Zhou. The Homophone Effect in Mandarin Word Recognition , 2015 .
[32] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[33] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[34] Philip Gage,et al. A new algorithm for data compression , 1994 .