Enhancing Monotonicity for Robust Autoregressive Transformer TTS
暂无分享,去创建一个
Sheng Zhao | Zhiyong Wu | Runnan Li | Helen M. Meng | Xiangyu Liang | Yanqing Liu | H. Meng | Zhiyong Wu | Sheng Zhao | Runnan Li | Yanqing Liu | Xiangyu Liang
[1] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[2] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[3] Dong Yu,et al. Enhancing Hybrid Self-attention Structure with Relative-position-aware Bias for Speech Synthesis , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.
[5] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[6] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[9] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[10] Lei Xie,et al. On the localness modeling for the self-attention based end-to-end speech synthesis , 2020, Neural Networks.
[11] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[13] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[14] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[15] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[16] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[17] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[18] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[19] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[20] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Soroosh Mariooryad,et al. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).