Improving the Fusion of Acoustic and Text Representations in RNN-T
暂无分享,去创建一个
Tara N. Sainath | Shuo-yiin Chang | Bo Li | Chao Zhang | Zhiyun Lu | T. Sainath | Bo Li | Zhiyun Lu | Shuo-yiin Chang | Chao Zhang
[1] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Joshua B. Tenenbaum,et al. Separating Style and Content with Bilinear Models , 2000, Neural Computation.
[4] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[5] Charless C. Fowlkes,et al. Bilinear classifiers for visual recognition , 2009, NIPS.
[6] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[7] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[9] Steve Renals,et al. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition , 2015, INTERSPEECH.
[10] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[11] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yu Zhang,et al. Highway long short-term memory RNNS for distant speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Hairong Liu,et al. Exploring neural transducers for end-to-end speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[14] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[15] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Stefan Lee,et al. Overcoming Language Priors in Visual Question Answering with Adversarial Regularization , 2018, NeurIPS.
[18] Yifan Gong,et al. Improving RNN Transducer Modeling for End-to-End Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Ehsan Variani,et al. A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[20] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Tara N. Sainath,et al. Joint Endpointing and Decoding with End-to-end Models , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[23] Yu Zhang,et al. Conformer: Convolution-augmented Transformer for Speech Recognition , 2020, INTERSPEECH.
[24] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] George Saon,et al. Alignment-Length Synchronous Decoding for RNN Transducer , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Tara N. Sainath,et al. Towards Fast and Accurate Streaming End-To-End ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Hermann Ney,et al. A New Training Pipeline for an Improved Neural Transducer , 2020, INTERSPEECH.
[28] Quoc V. Le,et al. Specaugment on Large Scale Datasets , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Chao Zhang,et al. Multimodal Intelligence: Representation Learning, Information Fusion, and Applications , 2019, IEEE Journal of Selected Topics in Signal Processing.
[30] Zhong Meng,et al. Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability , 2020, INTERSPEECH.
[31] Xiaofeng Liu,et al. Rnn-Transducer with Stateless Prediction Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Cyril Allauzen,et al. Hybrid Autoregressive Transducer (HAT) , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Yu Wu,et al. Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Brian Kingsbury,et al. Advancing RNN Transducer Technology for Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] M. Seltzer,et al. Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion , 2021, Interspeech.
[36] Arun Narayanan,et al. Improving Streaming Automatic Speech Recognition with Non-Streaming Model Distillation on Unsupervised Data , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Sining Sun,et al. Tiny Transducer: A Highly-Efficient Speech Recognition Model on Edge Devices , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Naoyuki Kanda,et al. Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] M. Seltzer,et al. Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Chao Zhang,et al. Combination of Deep Speaker Embeddings for Diarisation , 2020, Neural Networks.
[41] D. Willett,et al. Using Synthetic Audio to Improve the Recognition of Out-of-Vocabulary Words in End-to-End Asr Systems , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Tara N. Sainath,et al. Scaling End-to-End Models for Large-Scale Multilingual ASR , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[43] Tara N. Sainath,et al. Cascaded Encoders for Unifying Streaming and Non-Streaming ASR , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Philip C. Woodland,et al. Tree-Constrained Pointer Generator for End-to-End Contextual Speech Recognition , 2021, 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).