JEIT: Joint End-to-End Model and Internal Language Model Training for Speech Recognition
暂无分享,去创建一个
Tara N. Sainath | B. Ramabhadran | Rohit Prabhavalkar | Bo Li | A. Rosenberg | Ehsan Variani | Weiran Wang | Yu Zhang | Tongzhou Chen | Zhong Meng
[1] B. Ramabhadran,et al. Modular Hybrid Autoregressive Transducer , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[2] Tara N. Sainath,et al. JOIST: A Joint Speech and Text Streaming Model for ASR , 2022, 2022 IEEE Spoken Language Technology Workshop (SLT).
[3] Tara N. Sainath,et al. A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes , 2022, INTERSPEECH.
[4] Michael Auli,et al. Unified Speech-Text Pre-training for Speech Translation and Recognition , 2022, ACL.
[5] H. Zen,et al. MAESTRO: Matched Speech Text Representations through Modality Matching , 2022, INTERSPEECH.
[6] Brian Kingsbury,et al. Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Ankur Bapna,et al. SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training , 2021, ArXiv.
[8] Xie Chen,et al. Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition , 2021, INTERSPEECH.
[9] Jinyu Li,et al. Factorized Neural Transducer for Efficient Language Model Adaptation , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Jinyu Li,et al. Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS , 2021, Interspeech.
[11] Tara N. Sainath,et al. An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling , 2021, Interspeech.
[12] Tara N. Sainath,et al. Tied & Reduced RNN-T Decoder , 2021, Interspeech.
[13] Yonghong Yan,et al. Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Text Data , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Naoyuki Kanda,et al. Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition , 2021, Interspeech.
[15] Janne Pylkkönen,et al. Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network , 2021, Interspeech.
[16] Hermann Ney,et al. Librispeech Transducer Model with Internal Language Model Prior Correction , 2021, Interspeech.
[17] Naoyuki Kanda,et al. Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] D. Willett,et al. Using Synthetic Audio to Improve the Recognition of Out-of-Vocabulary Words in End-to-End Asr Systems , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Naoyuki Kanda,et al. Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[20] Tara N. Sainath,et al. Cascaded Encoders for Unifying Streaming and Non-Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[22] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[23] Xiaofeng Liu,et al. Rnn-Transducer with Stateless Prediction Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Cyril Allauzen,et al. Hybrid Autoregressive Transducer (HAT) , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Ehsan Variani,et al. A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[27] M. Seltzer,et al. RNN-T For Latency Controlled ASR With Improved Beam Search , 2019, ArXiv.
[28] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[29] Bhuvana Ramabhadran,et al. Speech Recognition with Augmented Synthesized Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[30] Tara N. Sainath,et al. Shallow-Fusion End-to-End Contextual Biasing , 2019, INTERSPEECH.
[31] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[32] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[35] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[36] Naoyuki Kanda,et al. Maximum a posteriori Based Decoding for CTC Acoustic Models , 2016, INTERSPEECH.
[37] Yoshua Bengio,et al. On Using Monolingual Corpora in Neural Machine Translation , 2015, ArXiv.
[38] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[39] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[40] Yifan Gong,et al. Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).
[41] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] R. A. Leibler,et al. On Information and Sufficiency , 1951 .