暂无分享,去创建一个
[1] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Moncef Gabbouj,et al. Ways to Implement Global Variance in Statistical Speech Synthesis , 2012, INTERSPEECH.
[3] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[4] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[5] Srikanth Ronanki,et al. Learning Interpretable Control Dimensions for Speech Synthesis by Using External Data , 2018, INTERSPEECH.
[6] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[7] Zack Hodari,et al. A learned emotion space for emotion recognition and emotive speech synthesis , 2017 .
[8] Yutaka Matsuo,et al. Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder , 2018, INTERSPEECH.
[9] Mark J. F. Gales,et al. Speech intonation for TTS: study on evaluation methodology , 2014, INTERSPEECH.
[10] Honglak Lee,et al. Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.
[11] Simon King,et al. Measuring a decade of progress in Text-to-Speech , 2014 .
[12] C. Bishop. Mixture density networks , 1994 .
[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[14] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[16] Jason Tyler Rolfe,et al. Discrete Variational Autoencoders , 2016, ICLR.
[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[18] Li-Rong Dai,et al. Fast Adaptation of Deep Neural Network Based on Discriminant Codes for Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] Yi Xu. SPEECH PROSODY : A METHODOLOGICAL REVIEW , 2011 .
[20] K. Arrow. A Difficulty in the Concept of Social Welfare , 1950, Journal of Political Economy.
[21] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[22] P. Groenen,et al. Modern Multidimensional Scaling: Theory and Applications , 1999 .
[23] Andrew Rosenberg,et al. AutoBI - a tool for automatic toBI annotation , 2010, INTERSPEECH.
[24] S. King,et al. The Blizzard Challenge 2013 , 2013, The Blizzard Challenge 2013.
[25] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[26] Yu Tsao,et al. Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks , 2017, INTERSPEECH.
[27] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[28] Simon King,et al. Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech , 2014, INTERSPEECH.
[29] Simon King,et al. The Blizzard Challenge 2008 , 2008 .
[30] Simon King,et al. Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .
[31] Max Welling,et al. VAE with a VampPrior , 2017, AISTATS.
[32] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[34] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[35] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[36] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[37] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[38] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[39] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.
[40] Junichi Yamagishi,et al. Adapting and controlling DNN-based speech synthesis using input codes , 2017, ICASSP.
[41] Roddy Cowie,et al. Emotional speech: Towards a new generation of databases , 2003, Speech Commun..
[42] Lars Hertel,et al. Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .
[43] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[44] Zhizheng Wu,et al. Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.
[45] Victor Ungureanu,et al. Experiments with Training Corpora for Statistical Text-to-speech Systems , 2018, INTERSPEECH.