Char2Wav: End-to-End Speech Synthesis
暂无分享,去创建一个
Yoshua Bengio | Aaron C. Courville | Kundan Kumar | Jose Sotelo | Kyle Kastner | João Felipe Santos | Soroush Mehri | Yoshua Bengio | Kyle Kastner | Soroush Mehri | Kundan Kumar | Jose M. R. Sotelo | J. F. Santos
[1] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[2] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[3] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[4] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .
[5] Joaquim Llisterri,et al. The Corpus DIMEx100: transcription and evaluation , 2010, Lang. Resour. Evaluation.
[6] Alex Graves,et al. Practical Variational Inference for Neural Networks , 2011, NIPS.
[7] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[8] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[9] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[10] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[11] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[12] Alex Graves,et al. Recurrent Models of Visual Attention , 2014, NIPS.
[13] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[14] Simon King,et al. Measuring a decade of progress in Text-to-Speech , 2014 .
[15] Heiga Zen,et al. Statistical parametric speech synthesis: from HMM to LSTM-RNN , 2015 .
[16] Zhizheng Wu,et al. A study of speaker adaptation for DNN-based speech synthesis , 2015, INTERSPEECH.
[17] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[18] Yoshua Bengio,et al. Blocks and Fuel: Frameworks for deep learning , 2015, ArXiv.
[19] Fuchun Peng,et al. Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Heiga Zen,et al. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Frank K. Soong,et al. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Heiga Zen,et al. Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends , 2015, IEEE Signal Processing Magazine.
[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[24] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[25] Geoffrey Zweig,et al. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion , 2015, INTERSPEECH.
[26] Samy Bengio,et al. Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[29] Ole Winther,et al. Neural Machine Translation with Characters and Hierarchical Encoding , 2016, ArXiv.
[30] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Heiga Zen,et al. Directly modeling voiced and unvoiced components in speech waveforms by neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[33] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[34] Heiga Zen,et al. Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis , 2016, INTERSPEECH.
[35] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[36] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .
[37] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[38] Stephen DiVerdi,et al. Cute: A concatenative method for voice conversion using exemplar-based unit selection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.