Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis
暂无分享,去创建一个
Zhen-Hua Ling | Lei He | Shifeng Pan | Ya-Jie Zhang | Zhenhua Ling | Lei He | Shifeng Pan | Ya-Jie Zhang
[1] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[2] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[3] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[4] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[5] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[6] Guillaume Desjardins,et al. Understanding disentangling in β-VAE , 2018, ArXiv.
[7] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[8] Guillaume Desjardins,et al. Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.
[9] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[10] Yoshua Bengio,et al. Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.
[11] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[12] Carl Doersch,et al. Tutorial on Variational Autoencoders , 2016, ArXiv.
[13] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[14] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[15] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[16] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[17] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Yu Zhang,et al. Learning Latent Representations for Speech Generation and Transformation , 2017, INTERSPEECH.
[19] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.