Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS
暂无分享,去创建一个
Frank K. Soong | Huaiping Ming | Lei He | Yujia Xiao | F. Soong | Lei He | Huaiping Ming | Yujia Xiao
[1] James R. Glass,et al. Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models , 2019, ArXiv.
[2] Rodrigo Nogueira,et al. Portuguese Named Entity Recognition using BERT-CRF , 2019, ArXiv.
[3] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[4] Frank K. Soong,et al. Feature reinforcement with word embedding and parsing information in neural TTS , 2019, ArXiv.
[5] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[6] Zhenyu Wang,et al. EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System , 2018, INTERSPEECH.
[7] Tomoki Toda,et al. Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis , 2019, INTERSPEECH.
[8] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[11] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[12] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[13] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[15] Frank K. Soong,et al. Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice , 2018, ArXiv.
[16] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[17] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[19] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[20] Lidong Bing,et al. Exploiting BERT for End-to-End Aspect-based Sentiment Analysis , 2019, EMNLP.
[21] Shuang Xu,et al. First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention , 2016, INTERSPEECH.
[22] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[23] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Ying Chen,et al. Implementing Prosodic Phrasing in Chinese End-to-end Speech Synthesis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).