E2E-SINCNET: Toward Fully End-To-End Speech Recognition
暂无分享,去创建一个
Titouan Parcollet | Mohamed Morchid | Georges Linarès | Titouan Parcollet | Mohamed Morchid | G. Linarès
[1] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[2] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[4] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[6] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[7] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[8] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[9] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.
[10] Steve Renals,et al. On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters , 2019, INTERSPEECH.
[11] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[12] Nicolas Usunier,et al. End-to-End Speech Recognition From the Raw Waveform , 2018, INTERSPEECH.
[13] Yoshua Bengio,et al. Interpretable Convolutional Filters with SincNet , 2018, ArXiv.
[14] Yoshua Bengio,et al. Speech and Speaker Recognition from Raw Waveform with SincNet , 2018, ArXiv.
[15] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[16] Zhiheng Huang,et al. Self-attention Networks for Connectionist Temporal Classification in Speech Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Satoshi Nakamura,et al. Attention-based Wav2Text with feature transfer learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[18] Titouan Parcollet,et al. The Pytorch-kaldi Speech Recognition Toolkit , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[20] Shinji Watanabe,et al. Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration , 2019, INTERSPEECH.
[21] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Dimitri Palaz,et al. End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks , 2013, ArXiv.
[23] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Sabrina Hirsch,et al. Digital Signal Processing A Computer Based Approach , 2016 .
[25] Nanyun Peng,et al. Espresso: A Fast End-to-End Neural Speech Recognition Toolkit , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[26] Boris Ginsburg,et al. Jasper: An End-to-End Convolutional Neural Acoustic Model , 2019, INTERSPEECH.
[27] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[28] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.