Investigating the Impact of Spectral and Temporal Degradation on End-to-End Automatic Speech Recognition Performance
暂无分享,去创建一个
[1] Yoshua Bengio,et al. End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results , 2014, ArXiv.
[2] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[3] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[4] S. Furui,et al. A JAPANESE NATIONAL PROJECT ON SPONTANEOUS SPEECH CORPUS AND PROCESSING TECHNOLOGY , 2003 .
[5] Hynek Hermansky,et al. Robust speech recognition in unknown reverberant and noisy conditions , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[6] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[7] Makio Kashino,et al. Phonemic restoration : The brain creates missing speech sounds , 2006 .
[8] R. M. Warren. Perceptual Restoration of Missing Speech Sounds , 1970, Science.
[9] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[10] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] Hynek Hermansky,et al. Temporal patterns (TRAPs) in ASR of noisy speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[14] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[15] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[16] M. Kashino,et al. Perceptual Restoration of Temporally Distorted Speech in L1 vs. L2: Local Time Reversal and Modulation Filtering , 2018, Front. Psychol..
[17] Paul Deléglise,et al. Enhancing the TED-LIUM Corpus with Selected Data for Language Modeling and More TED Talks , 2014, LREC.
[18] J. M. Ackroff,et al. Auditory Induction: Perceptual Synthesis of Absent Sounds , 1972, Science.
[19] R V Shannon,et al. Speech Recognition with Primarily Temporal Cues , 1995, Science.
[20] Elkan G. Akyürek,et al. Perceptual Restoration of Degraded Speech Is Preserved with Advancing Age , 2013, Journal of the Association for Research in Otolaryngology.
[21] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[22] Frédéric E. Theunissen,et al. The Modulation Transfer Function for Speech Intelligibility , 2009, PLoS Comput. Biol..
[23] Yue Dong,et al. Spectral and Temporal Envelope Cues for Human and Automatic Speech Recognition in Noise , 2019, Journal of the Association for Research in Otolaryngology.
[24] Sanjeev Khudanpur,et al. A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[26] Yu Tsao,et al. Automatic speech recognition with primarily temporal envelope information , 2014, INTERSPEECH.
[27] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[28] K. Saberi,et al. Cognitive restoration of reversed speech , 1999, Nature.
[29] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.