Camp: A Two-Stage Approach to Modelling Prosody in Context
暂无分享,去创建一个
Thomas Merritt | Zack Hodari | Alexis Moinet | Jaime Lorenzo-Trueba | Sri Karlapati | Ammar Abbas | Arnaud Joly | Penny Karanasou | Thomas Drugman | A. Moinet | Arnaud Joly | Jaime Lorenzo-Trueba | Thomas Drugman | Thomas Merritt | Zack Hodari | S. Karlapati | Ammar Abbas | Penny Karanasou
[1] Yusuke Miyao,et al. Learning with Lookahead: Can History-Based Models Rival Globally Optimized Models? , 2011, CoNLL.
[2] Method for the subjective assessment of intermediate quality level of , 2014 .
[3] Robert A. J. Clark,et al. A multi-level representation of f0 using the continuous wavelet transform and the Discrete Cosine Transform , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Martti Vainio,et al. Hierarchical Representation of Prosody for Statistical Speech Synthesis , 2015, ArXiv.
[5] Michael Wagner,et al. Toward a bestiary of English intonational contours* , 2016 .
[6] Oriol Vinyals,et al. Neural Discrete Representation Learning , 2017, NIPS.
[7] Timo Baumann,et al. An Empirical Analysis of the Correlation of Syntax and Prosody , 2018, INTERSPEECH.
[8] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[10] Simon King,et al. Using Pupillometry to Measure the Cognitive Load of Synthetic Speech , 2018, INTERSPEECH.
[11] Yuxuan Wang,et al. Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[12] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[13] Nigel G. Ward. Prosodic Patterns in English Conversation , 2019 .
[14] Rob Clark,et al. Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[15] Chengzhu Yu,et al. DurIAN: Duration Informed Attention Network For Multimodal Synthesis , 2019, ArXiv.
[16] Vincent Wan,et al. CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network , 2019, ICML.
[17] Srikanth Ronanki,et al. Prosody generation for text-to-speech synthesis , 2019 .
[18] Oliver Watts,et al. Using generative modelling to produce varied intonation for speech synthesis , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[19] Ronan Collobert,et al. wav2vec: Unsupervised Pre-training for Speech Recognition , 2019, INTERSPEECH.
[20] Shujie Liu,et al. Neural Speech Synthesis with Transformer Network , 2018, AAAI.
[21] James R. Glass,et al. Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models , 2019, ArXiv.
[22] Srikanth Ronanki,et al. Fine-grained robust prosody transfer for single-speaker neural text-to-speech , 2019, INTERSPEECH.
[23] Tomoki Toda,et al. Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis , 2019, INTERSPEECH.
[24] Sakriani Sakti,et al. The Zero Resource Speech Challenge 2019: TTS without T , 2019, INTERSPEECH.
[25] Thomas Drugman,et al. CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech , 2020, INTERSPEECH.
[26] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[27] Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0 , 2020, Speech Prosody 2020.
[28] Thomas Drugman,et al. Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection , 2019, INTERSPEECH.
[29] Simon King,et al. A Vector Quantized Variational Autoencoder (VQ-VAE) Autoregressive Neural $F_0$ Model for Statistical Parametric Speech Synthesis , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[30] Tao Qin,et al. FastSpeech 2: Fast and High-Quality End-to-End Text to Speech , 2021, ICLR.
[31] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.