Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning
暂无分享,去创建一个
Heiga Zen | Bhuvana Ramabhadran | Yonghui Wu | Andrew Rosenberg | Zhifeng Chen | Ron J. Weiss | Ye Jia | Yu Zhang | R. J. Skerry-Ryan | Z. Chen | Yonghui Wu | H. Zen | B. Ramabhadran | R. Skerry-Ryan | Yu Zhang | A. Rosenberg | Ye Jia
[1] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[2] Lior Wolf,et al. Fitting New Speakers Based on a Short Untranscribed Sample , 2018, ICML.
[3] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] James Glass,et al. Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Lior Wolf,et al. Unsupervised Polyglot Text-to-speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[7] Taesu Kim,et al. Learning pronunciation from a foreign language in speech synthesis networks , 2018, ArXiv.
[8] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[9] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[10] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[11] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[12] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[13] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[14] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[15] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[16] Yoshua Bengio,et al. Representation Mixing for TTS Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[18] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[20] Heiga Zen,et al. Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[21] Zhengchen Zhang,et al. A light-weight method of building an LSTM-RNN-based bilingual tts system , 2017, 2017 International Conference on Asian Language Processing (IALP).
[22] Walter Daelemans,et al. Data-Oriented Methods for Grapheme-to-Phoneme Conversion , 1993, EACL.
[23] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[24] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[25] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[26] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Heiga Zen,et al. Sample Efficient Adaptive Text-to-Speech , 2018, ICLR.
[28] Hui Liang,et al. Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech , 2011, INTERSPEECH.