PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
暂无分享,去创建一个
Yonghui Wu | H. Zen | Yu Zhang | Jonathan Shen | Ye Jia
[1] Yang Zhang,et al. Unified Mandarin TTS Front-end Based on Distilled BERT Model , 2020, ArXiv.
[2] Ron J. Weiss,et al. Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Bowen Zhou,et al. Improving Prosody Modelling with Cross-Utterance Bert Embeddings for End-to-End Speech Synthesis , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Manish Sharma,et al. Improving the Prosody of RNN-Based English Text-To-Speech Synthesis by Incorporating a BERT Model , 2020, INTERSPEECH.
[5] Heiga Zen,et al. Parallel Tacotron: Non-Autoregressive and Controllable TTS , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Heiga Zen,et al. Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling , 2020, ArXiv.
[7] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[8] Tomoki Toda,et al. Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis , 2019, INTERSPEECH.
[9] Michael Hahn,et al. Theoretical Limitations of Self-Attention in Neural Sequence Models , 2019, TACL.
[10] Yoram Singer,et al. Memory Efficient Adaptive Optimization , 2019, NeurIPS.
[11] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[12] Frank K. Soong,et al. Feature reinforcement with word embedding and parsing information in neural TTS , 2019, ArXiv.
[13] Yoshua Bengio,et al. Representation Mixing for TTS Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[15] Yuxuan Wang,et al. Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[17] Lukasz Kaiser,et al. Universal Transformers , 2018, ICLR.
[18] Ron J. Weiss,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[19] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[20] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[21] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[22] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Sercan Ö. Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[26] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[27] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[28] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[29] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[30] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[31] Richard Sproat,et al. The Kestrel TTS text normalization system , 2014, Natural Language Engineering.
[32] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.