A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech

[1]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2022, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT).

[2]  R. Sebastian,et al.  Speech Cues and Social Evaluation: Markers of Ethnicity, Social Class, and Age , 2018, Recent Advances in Language, Communication, and Social Psychology.

[3]  S. R. Livingstone,et al.  The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[4]  Lirong Dai,et al.  Emotional statistical parametric speech synthesis using LSTM-RNNs , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[5]  Alastair H. Moore,et al.  Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures , 2017, Comput. Speech Lang..

[6]  Samy Bengio,et al.  Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[7]  Adam Coates,et al.  Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.

[8]  Junichi Yamagishi,et al.  SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[9]  K. Harrington,et al.  Acoustic parameters of speech: Lack of correlation with perceptual and questionnaire‐based speech evaluation in patients with oral and oropharyngeal cancer treated with primary surgery , 2016, Head & neck.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yoshua Bengio,et al.  Attention-Based Models for Speech Recognition , 2015, NIPS.

[12]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[13]  Hemant A. Patil,et al.  Fusion of magnitude and phase-based features for objective evaluation of TTS voice , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[14]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[15]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.