Hierarchical RNNs for Waveform-Level Speech Synthesis
暂无分享,去创建一个
Mark J. F. Gales | Zhiyi Ma | Gilles Degottex | Qingyun Dou | Moquan Wan | M. Gales | G. Degottex | Qingyun Dou | Moquan Wan | Zhiyi Ma
[1] Heiga Zen,et al. Product of Experts for Statistical Parametric Speech Synthesis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[2] Heiga Zen,et al. Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[3] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[4] C. Bishop. Mixture density networks , 1994 .
[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[6] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[7] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[8] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[9] Heiga Zen,et al. Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters , 2010, SSW.
[10] Shuigeng Zhou,et al. Focusing Attention: Towards Accurate Text Recognition in Natural Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Qing Dou,et al. Waveform Level Synthesis , 2017 .
[12] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[13] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[15] 吉村 貴克,et al. Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .
[16] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[17] Mark J. F. Gales,et al. A Pulse Model in Log-domain for a Uniform Synthesizer , 2016, SSW.
[18] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[19] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[20] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.
[21] Helen M. Meng,et al. Multi-distribution deep belief network for speech synthesis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[23] Li-Rong Dai,et al. The USTC system for blizzard machine learning challenge 2017-ES2 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[24] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[25] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[27] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[28] Bajibabu Bollepalli,et al. A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[29] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[30] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[32] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.
[33] Ren-Hua Wang,et al. The USTC System for Blizzard Challenge 2010 , 2008 .