JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
暂无分享,去创建一个
D. Lim | Won Jang | O. Gyeonghwan | Hyeyeon Park | Bongwan Kim | Jesam Yoon
[1] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[2] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[3] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[6] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[11] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.
[12] Dong Yu,et al. Maximizing Mutual Information for Tacotron , 2019, ArXiv.
[13] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[14] Lei Xie,et al. Pre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech Synthesis , 2019, IEEE Access.
[15] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[16] K. Takeda,et al. Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Eric Battenberg,et al. Location-Relative Attention Mechanisms for Robust Long-Form Speech Synthesis , 2019, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Tian Xia,et al. Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.
[20] Ryuichi Yamamoto,et al. Parallel Wavegan: A Fast Waveform Generation Model Based on Generative Adversarial Networks with Multi-Resolution Spectrogram , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).