暂无分享,去创建一个
[1] Timo Baumann,et al. An Empirical Analysis of the Correlation of Syntax and Prosody , 2018, INTERSPEECH.
[2] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[3] Simon King,et al. A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $F_0$ Model for Statistical Parametric Speech Synthesis , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] Thomas Drugman,et al. Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection , 2019, INTERSPEECH.
[5] P. Prieto,et al. Preschoolers use prosodic mitigation strategies to encode polite stance , 2018, Speech Prosody 2018.
[6] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[7] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[8] José Ignacio Hualde,et al. Listening for sound, listening for meaning: Task effects on prosodic transcription , 2014 .
[9] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[10] Srikanth Ronanki,et al. A Template-Based Approach for Speech Synthesis Intonation Generation Using LSTMs , 2016, INTERSPEECH.
[11] Henrik Niemann,et al. Integrating the discreteness and continuity of intonational categories , 2017, J. Phonetics.
[12] Joseph Roy,et al. Crowd-sourcing prosodic annotation , 2017, Comput. Speech Lang..
[13] Adam J. Royer,et al. Prominence perception is dependent on phonology, semantics, and awareness of discourse , 2017 .
[14] Johanna D. Moore,et al. Paragraph-based prosodic cues for speech synthesis applications , 2016 .
[15] Oliver Watts,et al. Using generative modelling to produce varied intonation for speech synthesis , 2019, ArXiv.
[16] P. Prieto,et al. The contribution of context and contour to perceived belief in polar questions , 2015 .
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Zhizheng Wu,et al. Sentence-level control vectors for deep neural network speech synthesis , 2015, INTERSPEECH.
[19] Simon King,et al. The Blizzard Challenge 2008 , 2008 .
[20] Nigel G. Ward. Prosodic Patterns in English Conversation , 2019 .
[21] Xin Wang,et al. Initial investigation of encoder-decoder end-to-end TTS using marginalization of monotonic hard alignments , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[22] Michael Wagner,et al. Toward a bestiary of English intonational contours* , 2016 .
[23] Xin Wang,et al. Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis , 2018, ArXiv.
[24] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[25] Petra Wagner,et al. The Greennn Tree - Lengthening Position Influences Uncertainty Perception , 2019, INTERSPEECH.
[26] Mireia Farrús,et al. Using Prosody to Classify Discourse Relations , 2017, INTERSPEECH.
[27] Joakim Gustafson,et al. Spontaneous Conversational Speech Synthesis from Found Data , 2019, INTERSPEECH.
[28] Junichi Yamagishi,et al. HMM-BASED EXPRESSIVE SPEECH SYNTHESIS — TOWARDS TTS WITH ARBITRARY SPEAKING STYLES AND EMOTIONS , 2003 .
[29] Catherine Lai,et al. What do you mean, you're uncertain?: the interpretation of cue words and rising intonation in dialogue , 2010, INTERSPEECH.
[30] Kenneth Ward Church,et al. Text Analysis and Word Pronunciation in Text-to-speech Synthesis , 2013 .
[31] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[32] Gustav Eje Henter,et al. Casting to Corpus: Segmenting and Selecting Spontaneous Dialogue for Tts with a Cnn-lstm Speaker-dependent Breath Detector , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] S. Calhoun. The centrality of metrical structure in signaling information structure: A probabilistic perspective , 2010 .
[35] Max Welling,et al. VAE with a VampPrior , 2017, AISTATS.
[36] Olac Fuentes,et al. Inferring stance in news broadcasts from prosodic-feature configurations , 2018, Comput. Speech Lang..
[37] Lei He,et al. Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS , 2019, INTERSPEECH.
[38] Valerie Freeman,et al. Prosodic features of stances in conversation , 2019, Laboratory Phonology: Journal of the Association for Laboratory Phonology.
[39] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.