Wasserstein GAN and Waveform Loss-Based Acoustic Model Training for Multi-Speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder
暂无分享,去创建一个
Nobuaki Minematsu | Yi Zhao | Junichi Yamagishi | Daisuke Saito | Shinji Takaki | Hieu-Thi Luong | J. Yamagishi | D. Saito | Shinji Takaki | N. Minematsu | Yi Zhao | Hieu-Thi Luong
[1] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[2] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[3] Junichi Yamagishi,et al. Adapting and controlling DNN-based speech synthesis using input codes , 2017, ICASSP.
[4] Zhizheng Wu,et al. A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.
[5] Yusuke Ijima,et al. An Investigation of DNN-Based Speech Synthesis Using Speaker Codes , 2016, INTERSPEECH.
[6] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[7] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[8] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[9] Nobuaki Minematsu,et al. Speaker Representations for Speaker Adaptation in Multiple Speakers' BLSTM-RNN-Based Speech Synthesis , 2016, INTERSPEECH.
[10] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[11] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yu Zhang,et al. Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.
[13] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[14] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Simon King,et al. Direct Modelling of Magnitude and Phase Spectra for Statistical Parametric Speech Synthesis , 2017, INTERSPEECH.
[18] Shinnosuke Takamichi,et al. Training algorithm to deceive Anti-Spoofing Verification for DNN-based speech synthesis , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[20] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[21] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[22] Frank K. Soong,et al. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[25] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[27] Tomoki Toda,et al. An investigation of multi-speaker training for wavenet vocoder , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[28] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Inma Hernáez,et al. Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis , 2014, IEEE Journal of Selected Topics in Signal Processing.
[31] Shinnosuke Takamichi,et al. Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[32] Yu Zhang,et al. Training RNNs as Fast as CNNs , 2017, EMNLP 2018.
[33] Thomas S. Huang,et al. Fast Wavenet Generation Algorithm , 2016, ArXiv.
[34] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[36] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[37] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[38] Heiga Zen,et al. Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[39] Tomoki Toda,et al. Collapsed speech segment detection and suppression for WaveNet vocoder , 2018, INTERSPEECH.