Multi-Quartznet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion
暂无分享,去创建一个
Jing Xiao | Jian Luo | Jianzong Wang | Ning Cheng | Guilin Jiang | Jianzong Wang | Jing Xiao | Ning Cheng | Jian Luo | Guilin Jiang
[1] Amir Asif,et al. XceptionTime: A Novel Deep Architecture based on Depthwise Separable Convolutions for Hand Gesture Classification , 2019, ArXiv.
[2] Yan Song,et al. Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition , 2018, INTERSPEECH.
[3] Kyu J. Han,et al. Multi-Stride Self-Attention for Speech Recognition , 2019, INTERSPEECH.
[4] Weibin Zhang,et al. Multi-head Monotonic Chunkwise Attention For Online Speech Recognition , 2020, ArXiv.
[5] Steve Renals,et al. Multi-Scale Octave Convolutions for Robust Speech Recognition , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[7] Ronan Collobert,et al. Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions , 2019, INTERSPEECH.
[8] Mei-Yuh Hwang,et al. Adversarial Regularization for Attention Based End-to-End Robust Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[9] Jiangyan Yi,et al. Self-Attention Transducers for End-to-End Speech Recognition , 2019, INTERSPEECH.
[10] Xiaofei Wang,et al. Multi-encoder multi-resolution framework for end-to-end speech recognition , 2018, ArXiv.
[11] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Hao Zheng,et al. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).
[13] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[14] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Boris Ginsburg,et al. Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Kyu J. Han,et al. State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention with Dilated 1D Convolutions , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[17] Shiming Xiang,et al. AugFPN: Improving Multi-Scale Feature Learning for Object Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[21] Boris Ginsburg,et al. Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks , 2019, ArXiv.
[22] Paul N. Bennett,et al. Combiner: Inductively Learning Tree Structured Attention in Transformers , 2019 .
[23] Xiaofei Wang,et al. Multi-Stream End-to-End Speech Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[25] Gabriel Synnaeve,et al. Wav2Letter: an End-to-End ConvNet-based Speech Recognition System , 2016, ArXiv.
[26] Jiangyan Yi,et al. Synchronous Transformers for end-to-end Speech Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.
[28] Yidong Li,et al. Cross-Layer Feature Pyramid Network for Salient Object Detection , 2020, IEEE Transactions on Image Processing.
[29] Shanmuganathan Raman,et al. Depthwise-STFT Based Separable Convolutional Neural Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[31] Dong Yu,et al. Component Fusion: Learning Replaceable Language Model Component for End-to-end Speech Recognition System , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Nicolas Usunier,et al. Fully Convolutional Speech Recognition , 2018, ArXiv.