GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
暂无分享,去创建一个
R. Barra-Chicote | Grzegorz Beringer | Thomas Merritt | Abdelhamid Ezzerg | Magdalena Proszewska | Daniel S'aez-Trigueros
[1] Zhizheng Wu,et al. Cross-Lingual Voice Conversion with a Cycle Consistency Loss on Linguistic Representation , 2021, Interspeech.
[2] Xiaofen Xing,et al. Cross-Lingual Voice Conversion with Disentangled Universal Linguistic Representations , 2021, Interspeech.
[3] Daniel Korzekwa,et al. Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech , 2021, 11th ISCA Speech Synthesis Workshop (SSW 11).
[4] Jasha Droppo,et al. Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows , 2021, Interspeech.
[5] Sandra M. Aluísio,et al. SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model , 2021, Interspeech.
[6] Daniel Korzekwa,et al. Universal Neural Vocoding with Parallel Wavenet , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Yuan Jiang,et al. Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[8] Junichi Yamagishi,et al. Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[9] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[10] Junichi Yamagishi,et al. NAUTILUS: A Versatile Voice Cloning System , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Seung-won Park,et al. Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data , 2020, INTERSPEECH.
[12] Thomas Drugman,et al. CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech , 2020, INTERSPEECH.
[13] Hirokazu Kameoka,et al. Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining , 2019, INTERSPEECH.
[14] Kou Tanaka,et al. StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion , 2019, INTERSPEECH.
[15] Joan Serra,et al. Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion , 2019, NeurIPS.
[16] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[17] Haizhou Li,et al. Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.
[19] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[20] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[21] Haizhou Li,et al. Average Modeling Approach to Voice Conversion with Non-Parallel Data , 2018, Odyssey.
[22] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[23] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[24] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .