Tacotron: Towards End-to-End Speech Synthesis
暂无分享,去创建一个
Samy Bengio | Quoc V. Le | Navdeep Jaitly | Yuxuan Wang | Yonghui Wu | Zhifeng Chen | Ron J. Weiss | Rif A. Saurous | Yannis Agiomyrgiannakis | Ying Xiao | Zongheng Yang | R. J. Skerry-Ryan | Daisy Stanton | Rob Clark | Samy Bengio | Navdeep Jaitly | Z. Chen | Yonghui Wu | Zongheng Yang | Yuxuan Wang | R. Skerry-Ryan | Daisy Stanton | Y. Xiao | Yannis Agiomyrgiannakis | R. Clark | R. Saurous | N. Jaitly
[1] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[2] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[3] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[4] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[5] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[7] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[9] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[12] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.
[13] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[14] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[15] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Matthias Bethge,et al. A note on the evaluation of generative models , 2015, ICLR.
[19] Shuang Xu,et al. First Step Towards End-to-End Parametric TTS Synthesis: Generating Spectral Parameters with Neural Attention , 2016, INTERSPEECH.
[20] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[21] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[22] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[23] Navdeep Jaitly,et al. RNN Approaches to Text Normalization: A Challenge , 2016, ArXiv.
[24] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[25] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[26] Jason Lee,et al. Fully Character-Level Neural Machine Translation without Explicit Segmentation , 2016, TACL.
[27] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.