暂无分享,去创建一个
[1] Hirokazu Kameoka,et al. Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech Pretraining , 2019, INTERSPEECH.
[2] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[3] Lin-Shan Lee,et al. Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations , 2018, INTERSPEECH.
[4] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[5] Junichi Yamagishi,et al. Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion , 2020, Blizzard Challenge / Voice Conversion Challenge.
[6] Yu Tsao,et al. Unsupervised Representation Disentanglement Using Cross Domain Features and Adversarial Learning in Variational Autoencoder Based Voice Conversion , 2020, IEEE Transactions on Emerging Topics in Computational Intelligence.
[7] Yu Tsao,et al. MOSNet: Deep Learning based Objective Assessment for Voice Conversion , 2019, INTERSPEECH.
[8] Junichi Yamagishi,et al. Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions , 2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020.
[9] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[11] Doo-young Kim,et al. Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data , 2020, INTERSPEECH.
[12] Ricardo Gutierrez-Osuna,et al. Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion , 2019, INTERSPEECH.
[13] Junichi Yamagishi,et al. CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit (version 0.92) , 2019 .
[14] Jung-Woo Ha,et al. StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[16] Tomoki Koriyama,et al. JVS corpus: free Japanese multi-speaker voice corpus , 2019, ArXiv.
[17] Harshad Rai,et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .
[18] Mark Hasegawa-Johnson,et al. F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Lior Wolf,et al. Unsupervised Cross-Domain Singing Voice Conversion , 2020, INTERSPEECH.
[21] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Kou Tanaka,et al. StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion , 2019, INTERSPEECH.
[23] Mark Hasegawa-Johnson,et al. Zero-Shot Voice Style Transfer with Only Autoencoder Loss , 2019, ICML.
[24] Serge J. Belongie,et al. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Berrak Sisman,et al. Seen and Unseen Emotional Style Transfer for Voice Conversion with A New Emotional Speech Dataset , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Kou Tanaka,et al. ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Sangeun Kum,et al. Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks , 2019, Applied Sciences.
[28] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[29] Kou Tanaka,et al. CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion , 2020, INTERSPEECH.