StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
暂无分享,去创建一个
Kou Tanaka | Hirokazu Kameoka | Takuhiro Kaneko | Nobukatsu Hojo | H. Kameoka | Takuhiro Kaneko | Kou Tanaka | Nobukatsu Hojo
[1] Chung-Hsien Wu,et al. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , 2006, INTERSPEECH.
[2] Seyed Hamidreza Mohammadi,et al. Voice conversion using deep neural networks with speaker-independent pre-training , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[3] Hirokazu Kameoka,et al. Generative adversarial network-based postfilter for statistical parametric speech synthesis , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Yonghong Yan,et al. High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).
[5] Lior Wolf,et al. Unsupervised Cross-Domain Image Generation , 2016, ICLR.
[6] Tomoki Toda,et al. Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech , 2012, Speech Commun..
[7] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Moncef Gabbouj,et al. Voice Conversion Using Partial Least Squares Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Kou Tanaka,et al. ATTS2S-VC: Sequence-to-sequence Voice Conversion with Attention and Context Preservation Mechanisms , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Li-Rong Dai,et al. Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Tomoki Toda,et al. A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Yun Lei,et al. Advances in deep neural network approaches to speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Haizhou Li,et al. Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[17] Haifeng Li,et al. A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences , 2016, INTERSPEECH.
[18] Kou Tanaka,et al. Cyclegan-VC2: Improved Cyclegan-based Non-parallel Voice Conversion , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[20] Athanasios Mouchtaris,et al. Nonparallel training for voice conversion based on a parameter adaptation approach , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[21] Hirokazu Kameoka,et al. Generative Adversarial Network-Based Postfilter for STFT Spectrograms , 2017, INTERSPEECH.
[22] Hirokazu Kameoka,et al. Non-native speech conversion with consistency-aware recursive network and generative adversarial network , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[23] Takuhiro Kaneko,et al. Class-Distinct and Class-Mutual Image Generation with GANs , 2018, BMVC.
[24] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[25] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Takeru Miyato,et al. cGANs with Projection Discriminator , 2018, ICLR.
[28] Keikichi Hirose,et al. One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space , 2011, INTERSPEECH.
[29] John-Paul Hosom,et al. Improving the intelligibility of dysarthric speech , 2007, Speech Commun..
[30] Kou Tanaka,et al. ConvS2S-VC: Fully Convolutional Sequence-to-Sequence Voice Conversion , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[31] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[32] Kou Tanaka,et al. Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[33] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[34] Alexei A. Efros,et al. Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Hirokazu Kameoka,et al. CycleGAN-VC: Non-parallel Voice Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[36] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.
[37] Junichi Yamagishi,et al. The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods , 2018, Odyssey.
[38] Tetsuya Takiguchi,et al. Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments , 2013, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..
[39] Kou Tanaka,et al. StarGAN-VC: non-parallel many-to-many Voice Conversion Using Star Generative Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[40] Tetsuya Takiguchi,et al. Non-Parallel Training in Voice Conversion Using an Adaptive Restricted Boltzmann Machine , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[41] Jung-Woo Ha,et al. StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[42] Hirokazu Kameoka,et al. Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks , 2017, ArXiv.
[43] Kun Li,et al. Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Tetsuya Takiguchi,et al. Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines , 2014, IEICE Trans. Inf. Syst..
[45] Tomoki Toda,et al. Eigenvoice conversion based on Gaussian mixture model , 2006, INTERSPEECH.
[46] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[47] Andrea Vedaldi,et al. Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.
[48] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.
[50] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[51] Hirokazu Kameoka,et al. Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks , 2017, INTERSPEECH.
[52] Mikihiro Nakagiri,et al. Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[53] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[54] Steve J. Young,et al. Data-driven emotion conversion in spoken English , 2009, Speech Commun..
[55] Tetsuya Takiguchi,et al. High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion , 2014, INTERSPEECH.
[56] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.
[57] Shinnosuke Takamichi,et al. Non-Parallel Voice Conversion Using Variational Autoencoders Conditioned by Phonetic Posteriorgrams and D-Vectors , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Kou Tanaka,et al. ACVAE-VC: Non-Parallel Voice Conversion With Auxiliary Classifier Variational Autoencoder , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[59] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[60] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[61] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Kishore Prahallad,et al. Spectral Mapping Using Artificial Neural Networks for Voice Conversion , 2010, IEEE Transactions on Audio, Speech, and Language Processing.