Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
暂无分享,去创建一个
Sercan Ömer Arik | John Miller | Wei Ping | Sharan Narang | Jonathan Raiman | Andrew Gibiansky | Kainan Peng | Ajay Kannan | Sercan Ö. Arik | Sharan Narang | Jonathan Raiman | Andrew Gibiansky | John Miller | Wei Ping | Kainan Peng | Ajay Kannan
[1] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[2] Lior Wolf,et al. Voice Synthesis for in-the-Wild Speakers via a Phonological Loop , 2017, ArXiv.
[3] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[4] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[5] Simon King,et al. Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[6] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[7] Heiga Zen,et al. Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[8] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[9] A. Algorithms. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017 .
[10] Jason Weston,et al. A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.
[11] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Zhizheng Wu,et al. Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System , 2017, INTERSPEECH.
[13] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[14] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[15] Paul Taylor,et al. Text-to-Speech Synthesis , 2009 .
[16] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[17] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[19] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[20] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[21] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[22] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[23] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[24] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[25] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[26] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[27] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.