A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes
暂无分享,去创建一个
Yi Xu | Philip N. Garner | Branislav Gerazov | Omar Mohammed | G'erard Bailly | Yi Xu | G. Bailly | B. Gerazov | Omar Mohammed
[1] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.
[2] Gérard Bailly,et al. Evaluating the adequacy of synthetic prosody in signaling syntactic boundaries : methodology and first results , 1998 .
[3] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[4] Anne Lacheret,et al. Stylization and Trajectory Modelling of Short and Long Term Speech Prosody Variations , 2011, INTERSPEECH.
[5] Yi Xu. Contextual tonal variations in Mandarin , 1997 .
[6] Yann Morlec. Génération multiparamétrique de la prosodie du français par apprentissage automatique , 1997 .
[7] A GENERATIVE ADVERSARIAL NETWORK FOR STYLE MODELING IN A TEXT-TO-SPEECH SYSTEM , 2018 .
[8] IVAN FÓNAGY,et al. CLICHÉS MÉLODIQUES , 1983 .
[9] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[10] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[11] Gérard Bailly,et al. SFC: A trainable prosodic model , 2005, Speech Commun..
[12] Gérard Bailly,et al. A superposed prosodic model for Chinese text-to-speech synthesis , 2004, 2004 International Symposium on Chinese Spoken Language Processing.
[13] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[14] Wade Junek,et al. Mind Reading: The Interactive Guide to Emotions , 2007 .
[15] Heiga Zen,et al. Hierarchical Generative Modeling for Controllable Speech Synthesis , 2018, ICLR.
[16] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[17] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[18] Geoffrey E. Hinton,et al. Binary coding of speech spectrograms using a deep auto-encoder , 2010, INTERSPEECH.
[19] Heiga Zen,et al. Statistical parametric speech synthesis: from HMM to LSTM-RNN , 2015 .
[20] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[21] Lior Wolf,et al. VoiceLoop: Voice Fitting and Synthesis via a Phonological Loop , 2017, ICLR.
[22] Michael Wagner,et al. Toward a bestiary of English intonational contours* , 2016 .
[23] Stefano Ermon,et al. InfoVAE: Balancing Learning and Inference in Variational Autoencoders , 2019, AAAI.
[24] Frank K. Soong,et al. Modeling F0 trajectories in hierarchically structured deep neural networks , 2016, Speech Commun..
[25] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[26] Zhizheng Wu,et al. Merlin: An Open Source Neural Network Speech Synthesis System , 2016, SSW.
[27] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[28] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[29] Fang Liu,et al. Parallel Encoding of Focus and Interrogative Meaning in Mandarin Intonation , 2005, Phonetica.
[30] Yi Xu,et al. Speech melody as articulatorily implemented communicative functions , 2005, Speech Commun..
[31] G. Bailly,et al. LEARNING THE HIDDEN STRUCTURE OF SPEECH: FROM COMMUNICATIVE FUNCTIONS TO PROSODY , 2011 .
[32] Jeffrey Pennington,et al. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.
[33] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[34] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[35] Gérard Bailly,et al. A Weighted Superposition of Functional Contours Model for Modelling Contextual Prominence of Elementary Prosodic Contours , 2018, INTERSPEECH.
[36] Cheng-Yuan Liou,et al. Autoencoder for words , 2014, Neurocomputing.
[37] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.
[38] Eva Gårding,et al. A Generative Model of Intonation , 1983 .
[39] Pieter Abbeel,et al. Variational Lossy Autoencoder , 2016, ICLR.
[40] Xin Wang,et al. An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis , 2017, INTERSPEECH.
[41] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[42] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[44] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Paul W. Munro,et al. Principal Components Analysis Of Images Via Back Propagation , 1988, Other Conferences.
[46] Gérard Bailly,et al. The significance of scope in modelling tones in Chinese , 2018 .
[47] Daniel Hirst,et al. Form and function in the representation of speech prosody , 2005, Speech Commun..
[48] Nicolas Obin,et al. Sparse Coding of Pitch Contours with Deep Auto-Encoders , 2018, Speech Prosody 2018.
[49] Hugo Larochelle,et al. An Autoencoder Approach to Learning Bilingual Word Representations , 2014, NIPS.
[50] Yi Xu,et al. Effects of tone and focus on the formation and alignment of f0contours , 1999 .
[51] R. Wilcox. Introduction to Robust Estimation and Hypothesis Testing , 1997 .
[52] Gérard Bailly,et al. Talking Machines: Theories, Models, and Designs , 1992 .
[53] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[54] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.
[55] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.
[56] Gérard Bailly,et al. Generating prosodic attitudes in French: Data, model and evaluation , 2001, Speech Commun..