Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions
暂无分享,去创建一个
Navdeep Jaitly | Yuxuan Wang | Yonghui Wu | Zhifeng Chen | Ron J. Weiss | Rif A. Saurous | Yannis Agiomyrgiannakis | Yu Zhang | Ruoming Pang | Zongheng Yang | RJ Skerry-Ryan | Jonathan Shen | Mike Schuster | Navdeep Jaitly | Z. Chen | M. Schuster | Yonghui Wu | Zongheng Yang | Yuxuan Wang | R. Skerry-Ryan | Yannis Agiomyrgiannakis | R. Saurous | Yu Zhang | Ruoming Pang | Jonathan Shen | N. Jaitly
[1] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[2] S. Srihari. Mixture Density Networks , 1994 .
[3] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[4] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[7] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.
[8] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .
[9] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[10] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[11] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[12] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[13] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[14] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[18] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[19] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[20] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[21] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[22] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[23] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[24] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[25] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[26] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[27] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[28] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[29] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[30] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[31] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.