暂无分享,去创建一个
[1] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[2] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Haruo Hosoya,et al. Group-based Learning of Disentangled Representations with Generalizability for Novel Contents , 2019, IJCAI.
[4] Tomoki Toda,et al. Baseline System of Voice Conversion Challenge 2020 with Cyclic Variational Autoencoder and Parallel WaveGAN , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[5] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[6] Tomoki Toda,et al. Non-Parallel Voice Conversion with Cyclic Variational Autoencoder , 2019, INTERSPEECH.
[7] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[8] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .
[9] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.
[10] Ben Poole,et al. Weakly-Supervised Disentanglement Without Compromises , 2020, ICML.
[11] Zhe Gan,et al. Improving Zero-shot Voice Style Transfer via Disentangled Representation Learning , 2021, ICLR.
[12] Hiroko Terasawa,et al. A statistical model of timbre perception , 2006, SAPA@INTERSPEECH.
[13] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[14] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Kou Tanaka,et al. StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion , 2019, INTERSPEECH.
[16] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[17] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[18] Moncef Gabbouj,et al. Voice Conversion Using Dynamic Kernel Partial Least Squares Regression , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[19] Kou Tanaka,et al. ACVAE-VC: Non-parallel many-to-many voice conversion with auxiliary classifier variational autoencoder , 2018, ArXiv.
[20] Ricardo Gutierrez-Osuna,et al. Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion , 2019, INTERSPEECH.
[21] Kou Tanaka,et al. Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Dongsuk Yook,et al. Many-To-Many Voice Conversion Using Conditional Cycle-Consistent Adversarial Networks , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Yu Zhang,et al. Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data , 2017, NIPS.
[24] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[25] Hung-Yi Lee,et al. VQVC+: One-Shot Voice Conversion by Vector Quantization and U-Net architecture , 2020, INTERSPEECH.