Sequence-to-Sequence Voice Conversion with Similarity Metric Learned Using Generative Adversarial Networks
暂无分享,去创建一个
Hirokazu Kameoka | Kunio Kashino | Takuhiro Kaneko | Kaoru Hiramatsu | H. Kameoka | K. Kashino | Takuhiro Kaneko | Kaoru Hiramatsu
[1] Eric Moulines,et al. Statistical methods for voice quality transformation , 1995, EUROSPEECH.
[2] Yannis Stylianou,et al. A system for voice conversion based on probabilistic classification and a harmonic plus noise model , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[3] Hideki Kawahara,et al. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.
[4] Tomoki Toda,et al. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation , 2006, INTERSPEECH.
[5] Yonghong Yan,et al. High Quality Voice Conversion through Phoneme-Based Linear Mapping Functions with STRAIGHT for Mandarin , 2007, Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007).
[6] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[7] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Daniel Erro,et al. INCA Algorithm for Training Voice Conversion Systems From Nonparallel Corpora , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[9] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[10] Moncef Gabbouj,et al. Ways to Implement Global Variance in Statistical Speech Synthesis , 2012, INTERSPEECH.
[11] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[12] Tetsuya Takiguchi,et al. Exemplar-Based Voice Conversion Using Sparse Representation in Noisy Environments , 2013, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..
[13] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[14] Tetsuya Takiguchi,et al. Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines , 2014, IEICE Trans. Inf. Syst..
[15] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[16] Li-Rong Dai,et al. Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[17] Haizhou Li,et al. Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[18] Tetsuya Takiguchi,et al. High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion , 2014, INTERSPEECH.
[19] Seyed Hamidreza Mohammadi,et al. Voice conversion using deep neural networks with speaker-independent pre-training , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[20] Tomoki Toda,et al. A postfilter to modify the modulation spectrum in HMM-based speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Tomoki Toda,et al. Statistical singing voice conversion with direct waveform modification based on the spectrum differential , 2014, INTERSPEECH.
[22] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.
[23] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[24] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Tianqi Chen,et al. Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.
[27] Kun Li,et al. Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Tomoki Toda,et al. The Voice Conversion Challenge 2016 , 2016, INTERSPEECH.
[29] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[30] Ole Winther,et al. Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.
[31] Shinnosuke Takamichi,et al. Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Hirokazu Kameoka,et al. Generative Adversarial Network-Based Postfilter for STFT Spectrograms , 2017, INTERSPEECH.
[34] Lauri Juvela,et al. Non-parallel voice conversion using i-vector PLDA: towards unifying speaker verification and transformation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[36] Hirokazu Kameoka,et al. Generative adversarial network-based postfilter for statistical parametric speech synthesis , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).