Investigation of a Single-Channel Frequency-Domain Speech Enhancement Network to Improve End-to-End Bengali Automatic Speech Recognition Under Unseen Noisy Conditions
暂无分享,去创建一个
Yu Tsao | Wei-Ho Chung | Ryandhimas E. Zezario | Hsin-Min Wang | Supratip Ghose | Mahbub E. Noor | Yen-Ju Lu | Md Mahbub E. Noor | Syu-Siang Wang | Chia-Yu Chang | Shafique Ahmed | Yu Tsao | Hsin-Min Wang | Syu-Siang Wang | Yen-Ju Lu | W. Chung | S. Ghose | Chia-Yu Chang | Shafique Ahmed
[1] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[2] Shadrokh Samavi,et al. Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation , 2019, 2020 International Conference on Machine Vision and Image Processing (MVIP).
[3] Pabitra Mitra,et al. Bengali speech corpus for continuous auutomatic speech recognition system , 2011, 2011 International Conference on Speech Database and Assessments (Oriental COCOSDA).
[4] Xiaofei Wang,et al. A Comparative Study on Transformer vs RNN in Speech Applications , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[5] Li Chai,et al. A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement , 2021, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[6] Shafayat Ahmed,et al. Improving End-to-End Bangla Speech Recognition with Semi-supervised Training , 2020, FINDINGS.
[7] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[8] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[9] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[10] Tillman Weyde,et al. Improved Speech Enhancement with the Wave-U-Net , 2018, ArXiv.
[11] Yu Zhang,et al. Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM , 2017, INTERSPEECH.
[12] Supheakmungkol Sarin,et al. Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali , 2018, SLTU.
[13] Full Softmax,et al. One-pass single-channel noisy speech recognition using a combination of noisy and enhanced features , 2019 .
[14] Tomohiro Nakatani,et al. Improving Noise Robust Automatic Speech Recognition with Single-Channel Time-Domain Enhancement Network , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[16] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[18] Saeed Gazor,et al. An adaptive KLT approach for speech enhancement , 2001, IEEE Trans. Speech Audio Process..
[19] John R. Hershey,et al. Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.
[20] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[21] Ryandhimas E. Zezario,et al. Boosting Objective Scores of a Speech Enhancement Model by MetricGAN Post-processing , 2020, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[22] Jacob Benesty,et al. Springer handbook of speech processing , 2007, Springer Handbooks.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[25] DeLiang Wang,et al. A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[26] Jacob Benesty,et al. New insights into the noise reduction Wiener filter , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[27] Yu Tsao,et al. Incorporating Broad Phonetic Information for Speech Enhancement , 2020, INTERSPEECH.
[28] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[30] John R. Hershey,et al. Hybrid CTC/Attention Architecture for End-to-End Speech Recognition , 2017, IEEE Journal of Selected Topics in Signal Processing.
[31] Hermann Ney,et al. Investigation into Joint Optimization of Single Channel Speech Enhancement and Acoustic Modeling for Robust ASR , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).