Multi-Modal Pre-Training for Automated Speech Recognition
暂无分享,去创建一个
[1] Abdel-rahman Mohamed,et al. Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction , 2022, ICLR.
[2] Ruslan Salakhutdinov,et al. Hubert: How Much Can a Bad Teacher Benefit ASR Pre-Training? , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Aäron van den Oord,et al. Multimodal Self-Supervised Learning of General Audio Representations , 2021, ArXiv.
[4] Shih-Fu Chang,et al. VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text , 2021, NeurIPS.
[5] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] Ronghang Hu,et al. UniT: Multimodal Multitask Learning with a Unified Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Heng Wang,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[8] Neil Zeghidour,et al. Contrastive Learning of General-Purpose Audio Representations , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[10] Ian Vince McLoughlin,et al. Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution , 2020, INTERSPEECH.
[11] Andrew Zisserman,et al. Self-Supervised MultiModal Versatile Networks , 2020, NeurIPS.
[12] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[13] Yonghui Wu,et al. ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context , 2020, INTERSPEECH.
[14] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[16] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[18] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[19] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Sree Hari Krishnan Parthasarathi,et al. Improving Noise Robustness of Automatic Speech Recognition via Parallel Data and Teacher-student Learning , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[22] Sanjeev Khudanpur,et al. A Teacher-Student Learning Approach for Unsupervised Domain Adaptation of Sequence-Trained ASR Models , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[23] Andrew Zisserman,et al. A Short Note about Kinetics-600 , 2018, ArXiv.
[24] Jonathan Le Roux,et al. Student-teacher network learning with enhanced features , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[26] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[28] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.