暂无分享,去创建一个
Soroosh Mariooryad | Tom Bagby | Matt Shannon | Eric Battenberg | David Kao | R. J. Skerry-Ryan | Daisy Stanton | Raza Habib | Eric Battenberg | R. Skerry-Ryan | Daisy Stanton | Raza Habib | Matt Shannon | Soroosh Mariooryad | David Kao | Tom Bagby
[1] Yuan Jiang,et al. End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[2] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[3] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[4] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[5] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[6] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[7] Mike Lewis,et al. MelNet: A Generative Model for Audio in the Frequency Domain , 2019, ArXiv.
[8] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[9] Soroosh Mariooryad,et al. Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis , 2019, ArXiv.
[10] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[11] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[13] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[14] Aapo Hyvärinen,et al. Nonlinear independent component analysis: Existence and uniqueness results , 1999, Neural Networks.
[15] Sercan Ömer Arik,et al. Deep Voice 3: 2000-Speaker Neural Text-to-Speech , 2017, ICLR 2018.
[16] J. Russell. A circumplex model of affect. , 1980 .
[17] Andriy Mnih,et al. Disentangling by Factorising , 2018, ICML.
[18] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[19] Heiga Zen,et al. LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech , 2019, INTERSPEECH.
[20] Max Welling,et al. Semi-supervised Learning with Deep Generative Models , 2014, NIPS.
[21] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[22] Sercan Ömer Arik,et al. Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.
[23] Jakub M. Tomczak,et al. DIVA: Domain Invariant Variational Autoencoders , 2019, DGS@ICLR.
[24] Bernhard Schölkopf,et al. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.
[25] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[26] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[27] José J. Cañas,et al. Online Measuring of Available Resources , 2017 .
[28] Yann LeCun,et al. Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.
[29] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[30] Daan Wierstra,et al. Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.
[31] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[32] Xu Tan,et al. FastSpeech: Fast, Robust and Controllable Text to Speech , 2019, NeurIPS.
[33] Frank D. Wood,et al. Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.
[34] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[35] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[36] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[37] Marc Schröder,et al. Emotional speech synthesis: a review , 2001, INTERSPEECH.
[38] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.