Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis
暂无分享,去创建一个
Tomoki Toda | Shinji Watanabe | Kazuya Takeda | Karen Livescu | Tomoki Hayashi | Shubham Toshniwal | Karen Livescu | K. Takeda | T. Toda | Shinji Watanabe | Tomoki Hayashi | Shubham Toshniwal
[1] Patrick Nguyen,et al. Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis , 2018, NeurIPS.
[2] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] T. Toda,et al. The NAIST Text-to-Speech System for the Blizzard Challenge 2015 , 2015, The Blizzard Challenge 2015.
[4] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[5] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[6] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[8] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[9] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[12] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[13] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[15] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[16] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[17] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[19] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[20] Hideki Kawahara,et al. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[23] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[24] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[25] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[26] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[27] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[28] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[29] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .
[30] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[31] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[33] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Tomoki Toda,et al. An Investigation of Noise Shaping with Perceptual Weighting for Wavenet-Based Speech Generation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.