Improve Cross-Lingual Text-To-Speech Synthesis on Monolingual Corpora with Pitch Contour Information
暂无分享,去创建一个
Haitong Zhang | Wenjie Ou | Haoyue Zhan | Yue Lin | Hao Zhan | Haitong Zhang | Yue Lin | Wenjie Ou
[1] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[2] Shinnosuke Takamichi,et al. Acoustic model-based subword tokenization and prosodic-context extraction without language knowledge for text-to-speech synthesis , 2020, Speech Commun..
[3] U. Barbara. Disentangling stress and pitch accent : A typology of prominence at different prosodic levels 1 , 2012 .
[4] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[5] Kyubyong Park,et al. CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages , 2019, INTERSPEECH.
[6] Songxiang Liu,et al. Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Rob Goedemans,et al. A survey of word accentual patterns in the languages of the world , 2010 .
[8] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network for Speech Synthesis , 2020, INTERSPEECH.
[9] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] D. Horga. HANDBOOK OF THE INTERNATIONAL PHONETIC ASSOCIATION. A GUIDE TO THE USE OF THE INTERNATIONAL PHONETIC ALPHABET Cambridge: Cambridge University Press (1999), (204 stranice) , 1999 .
[11] Erich Elsen,et al. End-to-End Adversarial Text-to-Speech , 2020, ArXiv.
[12] Lei Chen,et al. Cross-Lingual, Multi-Speaker Text-To-Speech Synthesis Using Neural Speaker Embedding , 2019, INTERSPEECH.
[13] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[14] Chunghyun Ahn,et al. Emotional Speech Synthesis with Rich and Granularized Control , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Wei Song,et al. Building a mixed-lingual neural TTS system with only monolingual data , 2019, INTERSPEECH.
[17] Zhengchen Zhang,et al. A light-weight method of building an LSTM-RNN-based bilingual tts system , 2017, 2017 International Conference on Asian Language Processing (IALP).
[18] Hung-yi Lee,et al. End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning , 2019, INTERSPEECH.
[19] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[21] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[22] Ondrej Dusek,et al. One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech , 2020, INTERSPEECH.
[23] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[25] Heiga Zen,et al. Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning , 2019, INTERSPEECH.
[26] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[27] Shinnosuke Takamichi,et al. Prosody-aware subword embedding considering Japanese intonation systems and its application to DNN-based multi-dialect speech synthesis , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[28] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[29] Lior Wolf,et al. Unsupervised Polyglot Text-to-speech , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Jianwei Yu,et al. End-to-end Code-switched TTS with Mix of Monolingual Recordings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[32] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.