论文信息 - A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech - 字舞流文

A Generative Adversarial Network Based Ensemble Technique for Automatic Evaluation of Machine Synthesized Speech

Partha Pratim Roy | Balasubramanian Raman | Puneet Kumar | Shashank Kashyap | Ashutosh Chaubey | Jaynil Jaiswal | Sasi Kiran Reddy Bhimavarapu | B. Raman | P. Roy | Shashank Kashyap | Puneet Kumar | Ashutosh Chaubey | Jaynil Jaiswal | Sasidhar Reddy Bhimavarapu

[1] Aaron C. Courville,et al. Generative Adversarial Networks , 2022, 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT).

[2] R. Sebastian,et al. Speech Cues and Social Evaluation: Markers of Ethnicity, Social Class, and Age , 2018, Recent Advances in Language, Communication, and Social Psychology.

[3] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.

[4] Lirong Dai,et al. Emotional statistical parametric speech synthesis using LSTM-RNNs , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[5] Alastair H. Moore,et al. Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures , 2017, Comput. Speech Lang..

[6] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[7] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.

[8] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[9] K. Harrington,et al. Acoustic parameters of speech: Lack of correlation with perceptual and questionnaire‐based speech evaluation in patients with oral and oropharyngeal cancer treated with primary surgery , 2016, Head & neck.

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.

[12] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[13] Hemant A. Patil,et al. Fusion of magnitude and phase-based features for objective evaluation of TTS voice , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[14] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[15] A. Cook,et al. Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.