Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis
暂无分享,去创建一个
Jianqing Sun | Jiaen Liang | Binghuai Lin | Dengfeng Ke | Ya Li | Jinlong Xue | Yayue Deng | Qi Luo | YuKang Jia | Yukang Jia
[1] Fengyu Yang,et al. Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation , 2021, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Haizhou Li,et al. Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability , 2021, Interspeech.
[3] Tao Qin,et al. AdaSpeech: Adaptive Text to Speech for Custom Voice , 2021, ICLR.
[4] Shan Liu,et al. FeatherTTS: Robust and Efficient attention based Neural TTS , 2020, 11th ISCA Speech Synthesis Workshop (SSW 11).
[5] Tie-Yan Liu,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2020, ICLR.
[6] Jin Xu,et al. MultiSpeech: Multi-Speaker Text to Speech with Transformer , 2020, INTERSPEECH.
[7] Sungwon Kim,et al. Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search , 2020, NeurIPS.
[8] Shuang Liang,et al. Flow-TTS: A Non-Autoregressive Network for Text to Speech Based on Flow , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[10] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[11] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[13] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Colin Raffel,et al. Monotonic Chunkwise Attention , 2017, ICLR.
[15] Soo-Young Lee,et al. Emotional End-to-End Neural Speech Synthesizer , 2017, NIPS 2017.
[16] Morgan Sonderegger,et al. Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.
[17] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[18] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[19] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[21] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[22] Alex Graves. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.