ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
暂无分享,去创建一个
Yonghui Wu | C. Chiu | Wei Han | Jiahui Yu | Wei Han | Zhengdong Zhang | Yu Zhang | James Qin | Anmol Gulati | Ruoming Pang | Chung-Cheng Chiu
[1] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[2] Lukás Burget,et al. iVector-based discriminative adaptation for automatic speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.
[3] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[4] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[5] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[6] Li-Rong Dai,et al. Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[9] Sanjeev Khudanpur,et al. JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[13] Wei Li,et al. Streaming small-footprint keyword spotting using sequence-to-sequence models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[14] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[15] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.
[16] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Yu Zhang,et al. Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Rohit Prabhavalkar,et al. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Quoc V. Le,et al. Searching for Activation Functions , 2018, arXiv.
[20] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.
[21] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Nicolas Usunier,et al. Fully Convolutional Speech Recognition , 2018, ArXiv.
[23] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[24] Edouard Grave,et al. End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures , 2019, ArXiv.
[25] Thomas Hain,et al. Unsupervised Adaptation of Acoustic Models for ASR Using Utterance-Level Embeddings from Squeeze and Excitation Networks , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Ronan Collobert,et al. Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions , 2019, INTERSPEECH.
[27] Tara N. Sainath,et al. A Comparison of End-to-End Models for Long-Form Speech Recognition , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Tara N. Sainath,et al. Streaming End-to-end Speech Recognition for Mobile Devices , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[30] Hermann Ney,et al. A Comparison of Transformer and LSTM Encoder Decoder Models for ASR , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[31] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[32] Tara N. Sainath,et al. Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling , 2019, ArXiv.
[33] Qian Zhang,et al. Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] M. Seltzer,et al. Transformer-Based Acoustic Modeling for Hybrid Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Quoc V. Le,et al. Specaugment on Large Scale Datasets , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Tara N. Sainath,et al. A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Boris Ginsburg,et al. Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[39] Tara N. Sainath,et al. RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).