Improving Automatic Speech Recognition Utilizing Audio-codecs for Data Augmentation
暂无分享,去创建一个
[1] Will Song,et al. End-to-End Deep Neural Network for Automatic Speech Recognition , 2015 .
[2] Zafar Rafii,et al. Lossy Audio Compression Identification , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[3] Oliver Jokisch,et al. Quality Assessment of Two Fullband Audio Codecs Supporting Real-Time Communication , 2016, SPECOM.
[4] Yu Zhang,et al. Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Yanning Zhang,et al. Hybrid Deep Neural Network--Hidden Markov Model (DNN-HMM) Based Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.
[6] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[7] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[8] Jan Skoglund,et al. Summary of Opus listening test results , 2013 .
[9] Timothy B. Terriberry,et al. Definition of the Opus Audio Codec , 2012, RFC.
[10] Irina Illina,et al. New Paradigm in Speech Recognition: Deep Neural Networks , 2017, ICIS 2017.
[11] Timothy B. Terriberry,et al. High-Quality, Low-Delay Music Coding in the Opus Codec , 2016, ArXiv.
[12] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[13] Shinji Watanabe,et al. CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments* , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).
[14] Andreas Nürnberger,et al. An Amharic Syllable-Based Speech Corpus for Continuous Speech Recognition , 2019, SLSP.
[15] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[16] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[17] Oliver Jokisch,et al. Review of the Opus Codec in a WebRTC Scenario for Audio and Speech Communication , 2015, SPECOM.
[18] Geoffrey Zweig,et al. Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Gerald Penn,et al. Convolutional Neural Networks for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[20] Paul Foulkes,et al. The ?Mobile Phone Effect? on Vowel Formants , 2004 .
[21] Richard Socher,et al. Improved Regularization Techniques for End-to-End Speech Recognition , 2017, ArXiv.
[22] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[24] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[25] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[26] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[27] Marina Bosi,et al. Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.
[28] Ingo Siegert,et al. Emotion Intelligibility within Codec-Compressed and Reduced Bandwidth Speech , 2016, ITG Symposium on Speech Communication.
[29] Rui Zhang,et al. Addressee and Response Selection in Multi-Party Conversations with Speaker Interaction RNNs , 2017, AAAI.
[30] Biing-Hwang Juang,et al. Hidden Markov Models for Speech Recognition , 1991 .
[31] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[32] Karlheinz Brandenburg,et al. MP3 and AAC Explained , 1999 .