暂无分享,去创建一个
Benjamin van Niekerk | Hugo Seuté | Marc-André Carbonneau | Julian Zaïdi | M. Carbonneau | B. V. Niekerk | Julian Zaïdi | Hugo Seuté
[1] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[2] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.
[3] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[4] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[5] Method for the subjective assessment of intermediate quality level of , 2014 .
[6] Yoshua Bengio,et al. Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition , 2017, INTERSPEECH.
[7] Adrian La'ncucki. FastPitch: Parallel Text-to-speech with Pitch Prediction , 2020, ArXiv.
[8] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[9] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[10] Oliver Watts,et al. Where do the improvements come from in sequence-to-sequence neural TTS? , 2019 .
[11] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[12] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Tao Qin,et al. AdaSpeech: Adaptive Text to Speech for Custom Voice , 2021, ICLR.
[14] Eunho Yang,et al. Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation , 2021, ICML.
[15] Kerstin Fischer,et al. The Sound Makes the Greeting: Interpersonal Functions of Intonation in Human-Robot Interaction , 2015, AAAI Spring Symposia.
[16] Biing-Hwang Juang,et al. Speaker-Invariant Training Via Adversarial Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[18] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.
[19] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.
[20] S. R. Livingstone,et al. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English , 2018, PloS one.
[21] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[22] Thomas Drugman,et al. Towards Achieving Robust Universal Neural Vocoding , 2018, INTERSPEECH.
[23] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.
[24] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Hung-yi Lee,et al. Again-VC: A One-Shot Voice Conversion Using Activation Guidance and Adaptive Instance Normalization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[27] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[28] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[29] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[30] S. King,et al. The Blizzard Challenge 2013 , 2013, The Blizzard Challenge 2013.