Neural Sequence-to-Sequence Speech Synthesis Using a Hidden Semi-Markov Model Based Structured Attention Mechanism
暂无分享,去创建一个
Yoshihiko Nankaku | Keiichi Tokuda | Kei Hashimoto | Shinji Takaki | Takenori Yoshimura | Keiichiro Oura | Kenta Sumiya | Kei Hashimoto | Keiichiro Oura | K. Tokuda | Yoshihiko Nankaku | Shinji Takaki | Takenori Yoshimura | Kenta Sumiya
[1] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[2] Heiga Zen,et al. Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .
[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[4] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[6] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[8] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Xin Wang,et al. End-to-End Text-to-Speech Using Latent Duration Based on VQ-VAE , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[12] Tian Xia,et al. Aligntts: Efficient Feed-Forward Text-to-Speech System Without Explicit Alignment , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Yoshihiko Nankaku,et al. Temporal modeling in neural network based statistical parametric speech synthesis , 2016, SSW.