暂无分享,去创建一个
[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[2] Yonghong Yan,et al. High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).
[3] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Björn Schuller,et al. Stargan for Emotional Speech Conversion: Validated by Data Augmentation of End-To-End Emotion Recognition , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Yu Tsao,et al. Voice conversion from non-parallel corpora using variational auto-encoder , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).
[6] Ming-yang,et al. The NUS & NWPU system for Voice Conversion Challenge 2020 , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[7] Kishore Prahallad,et al. Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Hirokazu Kameoka,et al. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[9] Hirokazu Kameoka,et al. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks , 2017, INTERSPEECH.
[10] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[11] Hirokazu Kameoka,et al. StarGAN-based Emotional Voice Conversion for Japanese Phrases , 2021, ArXiv.
[12] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[13] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[16] Tatsuya Kawahara,et al. Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .
[17] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[18] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[19] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[20] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[21] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[22] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[23] Masanori Morise,et al. D4C, a band-aperiodicity estimator for high-quality speech synthesis , 2016, Speech Commun..
[24] Lauri Juvela,et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Shigeru Katagiri,et al. ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..
[26] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.