暂无分享,去创建一个
Yashesh Gaur | Chengyi Wang | Shujie Liu | Yu Wu | Rui Zhao | Jinyu Li
[1] Yifan Gong,et al. Advancing Acoustic-to-Word CTC Model , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[3] Yifan Gong,et al. Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Jonathan Le Roux,et al. Triggered Attention for End-to-end Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Zhong Meng,et al. High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Liang Lu,et al. Improving Layer Trajectory LSTM with Future Context Frames , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[11] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[14] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[15] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Shuang Xu,et al. Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese , 2018, INTERSPEECH.
[17] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[18] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[19] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[20] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] John Tran,et al. cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.
[22] Chengyi Wang,et al. Semantic Mask for Transformer based End-to-End Speech Recognition , 2020, INTERSPEECH.
[23] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[24] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[25] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[26] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[27] Chong Wang,et al. Lookahead Convolution Layer for Unidirectional Recurrent Neural Networks , 2016 .
[28] Yashesh Gaur,et al. Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[30] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Tara N. Sainath,et al. Two-Pass End-to-End Speech Recognition , 2019, INTERSPEECH.
[32] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[33] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[34] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[35] Kjell Schubert,et al. RNN-T For Latency Controlled ASR With Improved Beam Search , 2019, ArXiv.
[36] Yongqiang Wang,et al. Simplifying long short-term memory acoustic models for fast training and decoding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[39] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[41] Yajie Miao,et al. EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[42] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.