Cross-lingual Style Transfer with Conditional Prior VAE and Style Loss
暂无分享,去创建一个
[1] Arnaldo Cândido Júnior,et al. YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone , 2021, ICML.
[2] Fengyu Yang,et al. Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Zhou Zhao,et al. PortaSpeech: Portable and High-Quality Generative Text-to-Speech , 2021, NeurIPS.
[4] June Sig Sung,et al. Cross-lingual Low Resource Speaker Adaptation Using Phonological Features , 2021, Interspeech.
[5] Edwin R. Hancock,et al. Two-Pathway Style Embedding for Arbitrary Voice Conversion , 2021, Interspeech.
[6] Frank K. Soong,et al. Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS , 2021, Interspeech.
[7] Thomas Drugman,et al. A learned conditional prior for the VAE acoustic space of a TTS system , 2021, Interspeech.
[8] Chao Weng,et al. VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention , 2021, ArXiv.
[9] Zhiyong Wu,et al. Adversarially learning disentangled speech representations for robust multi-factor voice conversion , 2021, Interspeech.
[10] Tao Li,et al. Controllable Emotion Transfer For End-to-End Speech Synthesis , 2020, 2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[11] Haizhou Li,et al. Expressive TTS Training With Frame and Style Reconstruction Loss , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[12] Longbiao Wang,et al. Information Sieve: Content Leakage Reduction in End-to-End Prosody Transfer for Expressive Speech Synthesis , 2021 .
[13] Jingzhou Yang,et al. Towards Universal Text-to-Speech , 2020, INTERSPEECH.
[14] Tian Huey Teh,et al. Phonological Features for 0-shot Multilingual Speech Synthesis , 2020, INTERSPEECH.
[15] Thomas Drugman,et al. CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech , 2020, INTERSPEECH.
[16] Heiga Zen,et al. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] R. Barra-Chicote,et al. Using Vaes and Normalizing Flows for One-Shot Text-To-Speech Synthesis of Expressive Speech , 2019, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[19] Oliver Watts,et al. Using generative modelling to produce varied intonation for speech synthesis , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[20] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[21] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[23] Matthew B Hoy. Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants , 2018, Medical reference services quarterly.
[24] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Stefano Ermon,et al. Learning Hierarchical Features from Deep Generative Models , 2017, ICML.
[26] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.