Estimating Mutual Information in Prosody Representation for Emotional Prosody Transfer in Speech Synthesis
暂无分享,去创建一个
Tan Lee | Shirong Qiu | Ying Qin | Guangyan Zhang | Tan Lee | Ying Qin | Guangyan Zhang | Shirong Qiu
[1] Yuxuan Wang,et al. Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron , 2018, ICML.
[2] J. Kinney,et al. Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.
[3] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[4] D. W. Robinson,et al. Psychoacoustics—facts and models , 1991 .
[5] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[6] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[7] Oliver Watts,et al. Towards speaking style transplantation in speech synthesis , 2013, SSW.
[8] Björn Schuller,et al. Computational Paralinguistics , 2013 .
[9] Takao Kobayashi,et al. Speech Synthesis with Various Emotional Expressions and Speaking Styles by Style Interpolation and Morphing , 2005, IEICE Trans. Inf. Syst..
[10] Mark J. F. Gales,et al. Speaker and Expression Factorization for Audiobook Data: Expressiveness and Transplantation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Mark J. F. Gales. Cluster adaptive training of hidden Markov models , 2000, IEEE Trans. Speech Audio Process..
[12] Oliver Niebuhr,et al. Understanding prosody : the role of context, function and communication , 2012 .
[13] Yoshua Bengio,et al. Mutual Information Neural Estimation , 2018, ICML.
[14] P. Ekman. An argument for basic emotions , 1992 .
[15] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[16] Mark J. F. Gales,et al. Speech factorization for HMM-TTS based on cluster adaptive training , 2012, INTERSPEECH.
[17] Silvia Quazza,et al. Towards emotional speech synthesis: a rule based approach , 2004, SSW.
[18] A. Leentjens,et al. Disturbances of affective prosody in patients with schizophrenia; a cross sectional study , 1998, Journal of neurology, neurosurgery, and psychiatry.
[19] Dirk Heylen,et al. Generating expressive speech for storytelling applications , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[20] James Glass,et al. Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Yannis Stylianou,et al. Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Yamato Ohtani,et al. Emotional transplant in statistical speech synthesis based on emotion additive model , 2015, INTERSPEECH.
[23] Klaus R. Scherer,et al. Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..
[24] Simon King,et al. Disentangling Style Factors from Speaker Representations , 2019, INTERSPEECH.
[25] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[26] S. Varadhan,et al. Asymptotic evaluation of certain Markov process expectations for large time , 1975 .
[27] Tan Lee,et al. Revisiting Hidden Markov Models for Speech Emotion Recognition , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[28] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Duane G. Watson,et al. Experimental and theoretical advances in prosody: A review , 2010, Language and cognitive processes.
[30] Yuxuan Wang,et al. Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis , 2018, ICML.
[31] Francesc Alías,et al. Prosodic analysis of storytelling discourse modes and narrative situations oriented to text-to-speech synthesis , 2013, SSW.
[32] K. Scherer. Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.