暂无分享,去创建一个
Rohit Prabhavalkar | Trevor Strohman | Thibault Doutre | Chao Zhang | Zhiyun Lu | Liangliang Cao | Yanwei Pan | Rohit Prabhavalkar | Liangliang Cao | Trevor Strohman | Zhiyun Lu | Chao Zhang | Thibault Doutre | Yanwei Pan
[1] Tara N. Sainath,et al. Minimum Word Error Rate Training for Attention-Based Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Liangliang Cao,et al. Residual Energy-Based Models for End-to-End Speech Recognition , 2021, Interspeech 2021.
[3] Andreas Stolcke,et al. Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition , 2020, INTERSPEECH.
[4] Naoyuki Kanda,et al. On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer , 2020, Interspeech.
[5] Arun Narayanan,et al. Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Quoc V. Le,et al. Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition , 2020, ArXiv.
[7] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[8] Chengzhu Yu,et al. Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition , 2019, INTERSPEECH.
[9] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[10] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[11] Tara N. Sainath,et al. Recognizing Long-Form Speech Using Streaming End-to-End Models , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Brian Kingsbury,et al. Improvements to the IBM speech activity detection system for the DARPA RATS program , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[14] Kyu J. Han,et al. Deep Learning-Based Telephony Speech Recognition in the Wild , 2017, INTERSPEECH.
[15] Matt Shannon,et al. Optimizing Expected Word Error Rate via Sampling for Speech Recognition , 2017, INTERSPEECH.
[16] Tara N. Sainath,et al. A Better and Faster end-to-end Model for Streaming ASR , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[19] Brian Kingsbury,et al. On the limit of English conversational speech recognition , 2021, Interspeech.
[20] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[22] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[23] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[24] Leibny Paola García-Perera,et al. End-to-End Domain-Adversarial Voice Activity Detection , 2019, INTERSPEECH.
[25] Tara N. Sainath,et al. Less is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Jonathan Le Roux,et al. Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Xiaodong Cui,et al. Reducing Exposure Bias in Training Recurrent Neural Network Transducers , 2021, Interspeech 2021.
[28] Takaaki Hori,et al. Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers , 2021, Interspeech.
[29] Brian Kingsbury,et al. Advancing RNN Transducer Technology for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Olivier Siohan,et al. Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models , 2021, Interspeech 2021.
[32] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[33] Marc'Aurelio Ranzato,et al. Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.
[34] A Lightweight Framework for Online Voice Activity Detection in the Wild , 2021, Interspeech.