Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
暂无分享,去创建一个
Kei Sawada | Kei Hashimoto | Keiichiro Oura | K. Tokuda | Yoshihiko Nankaku | Yukiya Hono | Kazuna Tsuboi
[1] Heiga Zen,et al. Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Heiga Zen,et al. Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Tomoki Toda,et al. Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis , 2019, INTERSPEECH.
[4] Zhen-Hua Ling,et al. Learning Latent Representations for Style Control and Transfer in End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Taesu Kim,et al. Robust and Fine-grained Prosody Control of End-to-end Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[7] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[8] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[9] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[10] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[11] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Masanobu Abe,et al. An investigation to transplant emotional expressions in DNN-based TTS synthesis , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[13] Sercan Ö. Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[14] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[15] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[16] Yoshihiko Nankaku,et al. Trajectory training considering global variance for speech synthesis based on neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Zhizheng Wu,et al. Investigating gated recurrent neural networks for speech synthesis , 2016, ArXiv.
[18] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[19] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[20] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[21] Heiga Zen,et al. Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .
[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[24] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[25] S. Srihari. Mixture Density Networks , 1994 .