Minimum latency training of sequence transducers for streaming end-to-end speech recognition
暂无分享,去创建一个
[1] Jinyu Li. Recent Advances in End-to-End Automatic Speech Recognition , 2021, APSIPA Transactions on Signal and Information Processing.
[2] Shinji Watanabe,et al. A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Tim Fingscheidt,et al. Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[4] Hasim Sak,et al. Reducing Streaming ASR Model Delay with Self Alignment , 2021, Interspeech.
[5] Rohit Prabhavalkar,et al. Dissecting User-Perceived Latency of On-Device E2E Speech Recognition , 2021, Interspeech.
[6] Tara N. Sainath,et al. A Better and Faster end-to-end Model for Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Hang Su,et al. Alignment Restricted Streaming Recurrent Neural Network Transducer , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[8] Shinji Watanabe,et al. Recent Developments on Espnet Toolkit Boosted By Conformer , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yu Wu,et al. Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Tara N. Sainath,et al. FastEmit: Low-Latency Streaming ASR with Sequence-Level Emission Regularization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] M. Seltzer,et al. Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Shinji Watanabe,et al. Streaming Transformer Asr With Blockwise Synchronous Beam Search , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[13] Tara N. Sainath,et al. Emitting Word Timings with End-to-End Models , 2020, INTERSPEECH.
[14] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[15] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[16] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Jinyu Li,et al. Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Kjell Schubert,et al. Transformer-Transducer: End-to-End Speech Recognition with Self-Attention , 2019, ArXiv.
[21] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Hermann Ney,et al. Improved training of end-to-end attention models for speech recognition , 2018, INTERSPEECH.
[23] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[24] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[25] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[26] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[27] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[29] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[30] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[31] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.