Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language
暂无分享,去创建一个
Xin Wang | Junichi Yamagishi | Shinji Takaki | Yusuke Yasuda | J. Yamagishi | Xin Wang | Shinji Takaki | Yusuke Yasuda
[1] Xin Wang,et al. Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[2] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[3] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[4] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[5] Kevin Knight,et al. Multi-Source Neural Translation , 2016, NAACL.
[6] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Li-Rong Dai,et al. Forward Attention in Sequence- To-Sequence Acoustic Modeling for Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Junichi Yamagishi,et al. Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data , 2018, Odyssey.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Heiga Zen,et al. Sequence-to-sequence Neural Network Model with 2D Attention for Learning Japanese Pitch Accents , 2018, INTERSPEECH.
[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[13] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Lauri Juvela,et al. A Comparison of Recent Waveform Generation and Acoustic Modeling Methods for Neural-Network-Based Speech Synthesis , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[17] Xin Wang,et al. Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects , 2018, INTERSPEECH.
[18] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.