Emotional Speech Synthesis with Rich and Granularized Control
暂无分享,去创建一个
Chunghyun Ahn | Hong-Goo Kang | Se-Yun Um | Sangshin Oh | Kyungguen Byun | Inseon Jang | Hong-Goo Kang | Sangshin Oh | Inseon Jang | Seyun Um | Kyungguen Byun | C. Ahn
[1] Hong-Goo Kang,et al. Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems , 2019, ArXiv.
[2] Yuxuan Wang,et al. Uncovering Latent Style Factors for Expressive Speech Synthesis , 2017, ArXiv.
[3] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[4] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[5] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[6] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[8] Hong-Goo Kang,et al. An Effective Style Token Weight Control Technique for End-to-End Emotional Speech Synthesis , 2019, IEEE Signal Processing Letters.
[9] Welch Bl. THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .
[10] Frank K. Soong,et al. On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[12] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[13] B. L. Welch. The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.
[14] Taesu Kim,et al. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[16] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[17] Soo-Young Lee,et al. Emotional End-to-End Neural Speech Synthesizer , 2017, NIPS 2017.
[18] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[20] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[21] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[22] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[23] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[24] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[25] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Yongguo Kang,et al. Multi-reference Tacotron by Intercross Training for Style Disentangling, Transfer and Control in Speech Synthesis , 2019, ArXiv.