暂无分享,去创建一个
[1] Helen Meng,et al. Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.
[3] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[4] Tao Li,et al. Controllable Emotion Transfer For End-to-End Speech Synthesis , 2020, 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[5] Silvio Savarese,et al. Generalizing to Unseen Domains via Adversarial Data Augmentation , 2018, NeurIPS.
[6] Berrak Sisman,et al. Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability , 2021, Interspeech 2021.
[7] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.
[8] Berrak Sisman,et al. Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[10] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[11] Geng Yang,et al. Controlling Emotion Strength with Relative Attribute for End-to-End Speech Synthesis , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[12] Yu Tsao,et al. Quality-Net: An End-to-End Non-intrusive Speech Quality Assessment Model based on BLSTM , 2018, INTERSPEECH.
[13] Vikas Singh,et al. Efficient Relative Attribute Learning Using Graph Neural Networks , 2018, ECCV.
[14] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.
[15] Haizhou Li,et al. Expressive TTS Training with Frame and Style Reconstruction Loss , 2020, ArXiv.
[16] Yu Tsao,et al. MOSNet: Deep Learning based Objective Assessment for Voice Conversion , 2019, INTERSPEECH.
[17] Shan Yang,et al. Fine-Grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis , 2020, 2021 IEEE Spoken Language Technology Workshop (SLT).
[18] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[19] Philip N. Garner,et al. Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction , 2021, 11th ISCA Speech Synthesis Workshop (SSW 11).