Data augmentation using prosody and false starts to recognize non-native children's speech
暂无分享,去创建一个
[1] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[2] Syed Shahnawazuddin,et al. Explicit Pitch Mapping for Improved Children’s Speech Recognition , 2017, Circuits, Systems, and Signal Processing.
[3] Andreas Stolcke,et al. Language Modeling of Nonverbal Vocalizations in Spontaneous Speech , 2012, TSD.
[4] Syed Shahnawazuddin,et al. Effect of Prosody Modification on Children's ASR , 2017, IEEE Signal Processing Letters.
[5] Lonce L. Wyse,et al. Real-Time Signal Estimation From Modified Short-Time Fourier Transform Magnitude Spectra , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Yiming Wang,et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.
[7] Herbert Gish,et al. A parametric approach to vocal tract length normalization , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[8] Syed Shahnawazuddin,et al. Role of Prosodic Features on Children's Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[10] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.
[11] Helena Moniz,et al. Classification of disfluent phenomena as fluent communicative devices in specific prosodic contexts , 2009, INTERSPEECH.
[12] Andreas Stolcke,et al. Statistical language modeling for speech disfluencies , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[13] S Shahnawazuddin,et al. Improving Children's Speech Recognition Through Time Scale Modification Based Speaking Rate Adaptation , 2018, 2018 International Conference on Signal Processing and Communications (SPCOM).
[14] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[15] Syed Shahnawazuddin,et al. Improving Children's Speech Recognition Through Explicit Pitch Scaling Based on Iterative Spectrogram Inversion , 2017, INTERSPEECH.
[16] Mari Ostendorf,et al. Modeling disfluencies in conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[17] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[18] Daniele Falavigna,et al. TLT-school: a Corpus of Non Native Children Speech , 2020, LREC.
[19] Jozef Juhar,et al. Adding filled pauses and disfluent events into language models for speech recognition , 2016, 2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).
[20] S. Shahnawazuddin,et al. Creating speaker independent ASR system through prosody modification based data augmentation , 2020, Pattern Recognit. Lett..
[21] Andreas Stolcke,et al. Automatic disfluency identification in conversational speech using multiple knowledge sources , 2003, INTERSPEECH.