Semi-Supervised Learning Based on Hierarchical Generative Models for End-to-End Speech Synthesis
暂无分享,去创建一个
Yoshihiko Nankaku | Kei Hashimoto | Shinji Takaki | Takato Fujimoto | Keiichiro Oura | Keiichi Tokuda | Kei Hashimoto | Keiichiro Oura | K. Tokuda | Yoshihiko Nankaku | Shinji Takaki | Takato Fujimoto
[1] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[2] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[3] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Xin Wang,et al. Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Xu Tan,et al. Almost Unsupervised Text to Speech and Automatic Speech Recognition , 2019, ICML.
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] Linhao Dong,et al. Boosting Character-Based Chinese Speech Synthesis via Multi-Task Learning and Dictionary Tutoring , 2019, INTERSPEECH.
[8] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[9] Yoshua Bengio,et al. Representation Mixing for TTS Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[11] Keiichi Tokuda,et al. XIMERA: a new TTS from ATR based on corpus-based technologies , 2004, SSW.
[12] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[13] Ming Zhou,et al. Close to Human Quality TTS with Transformer , 2018, ArXiv.
[14] Hugo Larochelle,et al. MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.
[15] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[16] Tomoki Toda,et al. Speaker-Dependent WaveNet Vocoder , 2017, INTERSPEECH.
[17] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[18] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[19] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[21] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[22] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[24] Soroosh Mariooryad,et al. Semi-Supervised Generative Modeling for Controllable Speech Synthesis , 2019, ICLR.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.