Representation Mixing for TTS Synthesis
暂无分享,去创建一个
Yoshua Bengio | Aaron C. Courville | Kyle Kastner | João Felipe Santos | Yoshua Bengio | Kyle Kastner | J. F. Santos
[1] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[2] Chelsea M. Eddington,et al. How meaning similarity influences ambiguous word processing: the current state of the literature , 2015, Psychonomic bulletin & review.
[3] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[4] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[5] Douglas Eck,et al. A Neural Representation of Sketch Drawings , 2017, ICLR.
[6] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[7] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[8] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[9] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Quoc V. Le,et al. HyperNetworks , 2016, ICLR.
[11] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[12] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[13] Wei Ping,et al. ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Lior Wolf,et al. Voice Synthesis for in-the-Wild Speakers via a Phonological Loop , 2017, ArXiv.
[16] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[17] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[18] Leon A. Gatys,et al. Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yoshua Bengio,et al. Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.
[20] Hideyuki Tachibana,et al. Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] V. Rich. Personal communication , 1989, Nature.
[22] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[23] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[24] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.
[25] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[26] Fuchun Peng,et al. Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Richard F. Lyon,et al. Auditory model inversion for sound separation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.
[28] Erhardt Barth,et al. Recurrent Dropout without Memory Loss , 2016, COLING.
[29] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.