Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE
暂无分享,去创建一个
Helen M. Meng | Xunying Liu | Xixin Wu | Disong Wang | Zhiyong Wu | Hui Lu
[1] Dong Yu,et al. Robust Disentangled Variational Speech Representation Learning for Zero-Shot Voice Conversion , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Helen Meng,et al. VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion , 2021, Interspeech.
[3] Zhe Gan,et al. Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning , 2021, ICLR.
[4] Bin Ma,et al. Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Hung-yi Lee,et al. Again-VC: A One-Shot Voice Conversion Using Activation Guidance and Adaptive Instance Normalization , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Hui Bu,et al. AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines , 2020, ArXiv.
[7] Simon King,et al. An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[8] Jaehyeon Kim,et al. HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis , 2020, NeurIPS.
[9] Hung-Yi Lee,et al. VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture , 2020, INTERSPEECH.
[10] Hung-yi Lee,et al. One-Shot Voice Conversion by Vector Quantization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[12] Haizhou Li,et al. A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[13] Boris Ginsburg,et al. NeMo: a toolkit for building AI applications using Neural Modules , 2019, ArXiv.
[14] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[15] Haizhou Li,et al. Cross-lingual Voice Conversion with Bilingual Phonetic Posteriorgram and Average Modeling , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Hung-yi Lee,et al. One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization , 2019, INTERSPEECH.
[17] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[18] Taehwan Kim,et al. Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion , 2018, INTERSPEECH.
[19] Guillaume Desjardins,et al. Understanding disentangling in β-VAE , 2018, ArXiv.
[20] Abien Fred Agarap. Deep Learning using Rectified Linear Units (ReLU) , 2018, ArXiv.
[21] Stephan Mandt,et al. Disentangled Sequential Autoencoder , 2018, ICML.
[22] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.
[23] Roger B. Grosse,et al. Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.
[24] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[28] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[31] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..